skychwang2 commited on
Commit
d0fe437
·
verified ·
1 Parent(s): 6ca8570

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +275 -17
index.html CHANGED
@@ -1,19 +1,277 @@
1
  <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  </html>
 
1
  <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning</title>
7
+ <style>
8
+ body {
9
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
10
+ line-height: 1.6;
11
+ color: #333;
12
+ max-width: 1200px;
13
+ margin: 0 auto;
14
+ padding: 20px;
15
+ background-color: #f5f7fa;
16
+ }
17
+
18
+ .container {
19
+ background-color: white;
20
+ border-radius: 8px;
21
+ box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
22
+ padding: 30px;
23
+ margin-bottom: 20px;
24
+ }
25
+
26
+ h1 {
27
+ color: #2c3e50;
28
+ text-align: center;
29
+ margin-bottom: 30px;
30
+ font-weight: 600;
31
+ }
32
+
33
+ .description {
34
+ margin-bottom: 30px;
35
+ font-size: 16px;
36
+ color: #555;
37
+ text-align: center;
38
+ }
39
+
40
+ table {
41
+ width: 100%;
42
+ border-collapse: collapse;
43
+ margin-top: 20px;
44
+ font-size: 15px;
45
+ }
46
+
47
+ thead {
48
+ background-color: #f8f9fa;
49
+ font-weight: bold;
50
+ }
51
+
52
+ th, td {
53
+ padding: 12px 15px;
54
+ text-align: center;
55
+ border-bottom: 1px solid #e0e0e0;
56
+ }
57
+
58
+ th {
59
+ position: sticky;
60
+ top: 0;
61
+ background-color: #f8f9fa;
62
+ box-shadow: 0 2px 2px -1px rgba(0, 0, 0, 0.1);
63
+ }
64
+
65
+ tbody tr:hover {
66
+ background-color: #f1f5f9;
67
+ }
68
+
69
+ .model-name {
70
+ text-align: left;
71
+ font-weight: 500;
72
+ }
73
+
74
+ .human-row {
75
+ font-weight: bold;
76
+ background-color: #e3f2fd;
77
+ }
78
+
79
+ .top-model {
80
+ background-color: #fff8e1;
81
+ }
82
+
83
+ .category-header {
84
+ background-color: #f5f5f5;
85
+ font-weight: bold;
86
+ }
87
+
88
+ .file-support {
89
+ font-size: 12px;
90
+ color: #666;
91
+ }
92
+
93
+ .footnote {
94
+ font-size: 14px;
95
+ color: #666;
96
+ margin-top: 30px;
97
+ border-top: 1px solid #eee;
98
+ padding-top: 20px;
99
+ }
100
+ </style>
101
+ </head>
102
+ <body>
103
+ <div class="container">
104
+ <h1>The <i>BLUR</i> Leaderboard</h1>
105
+
106
+ <div class="description">
107
+ <p>Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning</p>
108
+ <p>Dataset: <a href="https://huggingface.co/datasets/PatronusAI/BLUR">Link</a>; Paper: <a href="https://arxiv.org/abs/2503.19193">Link</a></p>
109
+ </div>
110
+
111
+ <table>
112
+ <thead>
113
+ <tr>
114
+ <th>Model / System</th>
115
+ <th>Q<sub>T</sub></th>
116
+ <th>Q<sub>F</sub></th>
117
+ <th>H<sub>E</sub></th>
118
+ <th>H<sub>M</sub></th>
119
+ <th>H<sub>H</sub></th>
120
+ <th>Overall</th>
121
+ </tr>
122
+ </thead>
123
+ <tbody>
124
+ <!-- Base Models -->
125
+ <tr class="category-header">
126
+ <td colspan="7">Foundation Models</td>
127
+ </tr>
128
+ <tr>
129
+ <td class="model-name">Llama-3.1-405B</td>
130
+ <td>0.34</td>
131
+ <td>0.17<span class="file-support">°</span></td>
132
+ <td>0.35</td>
133
+ <td>0.32</td>
134
+ <td>0.25</td>
135
+ <td>0.30</td>
136
+ </tr>
137
+ <tr>
138
+ <td class="model-name">claude-3-5-sonnet-20241022</td>
139
+ <td>0.44</td>
140
+ <td>0.28<span class="file-support">•</span></td>
141
+ <td>0.42</td>
142
+ <td>0.42</td>
143
+ <td>0.36</td>
144
+ <td>0.40</td>
145
+ </tr>
146
+ <tr>
147
+ <td class="model-name">gpt-4o-2024-11-20</td>
148
+ <td>0.42</td>
149
+ <td>0.28<span class="file-support">•</span></td>
150
+ <td>0.39</td>
151
+ <td>0.43</td>
152
+ <td>0.35</td>
153
+ <td>0.38</td>
154
+ </tr>
155
+ <tr>
156
+ <td class="model-name">o1-2024-12-17</td>
157
+ <td>0.54</td>
158
+ <td>0.36<span class="file-support">•</span></td>
159
+ <td>0.56</td>
160
+ <td>0.52</td>
161
+ <td>0.44</td>
162
+ <td>0.49</td>
163
+ </tr>
164
+ <tr>
165
+ <td class="model-name">DeepSeek-R1</td>
166
+ <td>0.45</td>
167
+ <td>0.27<span class="file-support">°</span></td>
168
+ <td>0.46</td>
169
+ <td>0.44</td>
170
+ <td>0.35</td>
171
+ <td>0.41</td>
172
+ </tr>
173
+
174
+ <!-- Chat Products -->
175
+ <tr class="category-header">
176
+ <td colspan="7">AI Assistants</td>
177
+ </tr>
178
+ <tr>
179
+ <td class="model-name">Microsoft Copilot</td>
180
+ <td>0.29</td>
181
+ <td>0.23<span class="file-support">•</span></td>
182
+ <td>0.29</td>
183
+ <td>0.32</td>
184
+ <td>0.22</td>
185
+ <td>0.27</td>
186
+ </tr>
187
+ <tr>
188
+ <td class="model-name">Mistral Le Chat</td>
189
+ <td>0.40</td>
190
+ <td>0.27<span class="file-support">•</span></td>
191
+ <td>0.47</td>
192
+ <td>0.38</td>
193
+ <td>0.32</td>
194
+ <td>0.37</td>
195
+ </tr>
196
+ <tr>
197
+ <td class="model-name">Perplexity Pro Search</td>
198
+ <td>0.31</td>
199
+ <td>0.15<span class="file-support">•</span></td>
200
+ <td>0.29</td>
201
+ <td>0.29</td>
202
+ <td>0.24</td>
203
+ <td>0.27</td>
204
+ </tr>
205
+ <tr>
206
+ <td class="model-name">ChatGPT-4o</td>
207
+ <td>0.53</td>
208
+ <td>0.36</td>
209
+ <td>0.60</td>
210
+ <td>0.52</td>
211
+ <td>0.41</td>
212
+ <td>0.49</td>
213
+ </tr>
214
+
215
+ <!-- Agent Systems -->
216
+ <tr class="category-header">
217
+ <td colspan="7">Agentic Systems</td>
218
+ </tr>
219
+ <tr class="top-model">
220
+ <td class="model-name">HuggingFace Agents + Claude 3.5 Sonnet</td>
221
+ <td>0.61</td>
222
+ <td>0.41<span class="file-support">•</span></td>
223
+ <td>0.60</td>
224
+ <td>0.56</td>
225
+ <td>0.54</td>
226
+ <td>0.56</td>
227
+ </tr>
228
+ <tr>
229
+ <td class="model-name">DynaSaur + GPT-4o</td>
230
+ <td>0.58</td>
231
+ <td>0.27</td>
232
+ <td>0.61</td>
233
+ <td>0.52</td>
234
+ <td>0.44</td>
235
+ <td>0.50</td>
236
+ </tr>
237
+ <tr>
238
+ <td class="model-name">Operator</td>
239
+ <td>0.57</td>
240
+ <td>0.46<span class="file-support">•</span></td>
241
+ <td>0.56</td>
242
+ <td>0.56</td>
243
+ <td>0.52</td>
244
+ <td>0.54</td>
245
+ </tr>
246
+
247
+ <!-- Baselines -->
248
+ <tr class="category-header">
249
+ <td colspan="7">Baselines</td>
250
+ </tr>
251
+ <tr>
252
+ <td class="model-name">Search Engine</td>
253
+ <td>0.05</td>
254
+ <td>0.03<span class="file-support">•</span></td>
255
+ <td>0.08</td>
256
+ <td>0.05</td>
257
+ <td>0.02</td>
258
+ <td>0.04</td>
259
+ </tr>
260
+ <tr class="human-row">
261
+ <td class="model-name">Human</td>
262
+ <td>0.98</td>
263
+ <td>1.00</td>
264
+ <td>0.98</td>
265
+ <td>0.98</td>
266
+ <td>0.99</td>
267
+ <td>0.98</td>
268
+ </tr>
269
+ </tbody>
270
+ </table>
271
+
272
+ <div class="footnote">
273
+ <p><strong>Table 1:</strong> System and model performance on the BLUR benchmark. Q<sub>T</sub> and Q<sub>F</sub> denote performance on text-only queries and queries with file inputs, respectively. System support for file inputs is indicated, where ° signifies that the system does not support file uploads and • denotes partial support of certain file type extensions; the absence of a circle denotes that all file type uploads are supported. H<sub>E</sub>, H<sub>M</sub>, and H<sub>H</sub> represent system performance on <em>easy</em>, <em>medium</em>, and <em>hard</em> query difficulty subsets, respectively.</p>
274
+ </div>
275
+ </div>
276
+ </body>
277
  </html>