ontoligent commited on
Commit
967e756
·
verified ·
1 Parent(s): 0ccdb1e

Add 3 files

Browse files
Files changed (3) hide show
  1. README.md +7 -5
  2. index.html +395 -19
  3. prompts.txt +1 -0
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- title: Ds 5001 Text As Data
3
- emoji: 📚
4
- colorFrom: gray
5
- colorTo: green
6
  sdk: static
7
  pinned: false
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: ds-5001-text-as-data
3
+ emoji: 🐳
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: static
7
  pinned: false
8
+ tags:
9
+ - deepsite
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
index.html CHANGED
@@ -1,19 +1,395 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Understanding Attention Mechanisms in LLMs</title>
7
+ <script src="https://cdn.tailwindcss.com"></script>
8
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
9
+ <style>
10
+ .code-block {
11
+ background-color: #2d2d2d;
12
+ color: #f8f8f2;
13
+ padding: 1rem;
14
+ border-radius: 0.5rem;
15
+ font-family: 'Courier New', Courier, monospace;
16
+ overflow-x: auto;
17
+ }
18
+ .attention-visual {
19
+ display: flex;
20
+ justify-content: center;
21
+ margin: 2rem 0;
22
+ }
23
+ .attention-node {
24
+ width: 60px;
25
+ height: 60px;
26
+ border-radius: 50%;
27
+ display: flex;
28
+ align-items: center;
29
+ justify-content: center;
30
+ font-weight: bold;
31
+ position: relative;
32
+ }
33
+ .attention-line {
34
+ position: absolute;
35
+ background-color: rgba(59, 130, 246, 0.5);
36
+ transform-origin: left center;
37
+ }
38
+ .explanation-box {
39
+ background-color: #f0f9ff;
40
+ border-left: 4px solid #3b82f6;
41
+ padding: 1rem;
42
+ margin: 1rem 0;
43
+ border-radius: 0 0.5rem 0.5rem 0;
44
+ }
45
+ .citation {
46
+ background-color: #f8fafc;
47
+ padding: 0.5rem;
48
+ margin: 0.5rem 0;
49
+ border-left: 3px solid #94a3b8;
50
+ }
51
+ </style>
52
+ </head>
53
+ <body class="bg-gray-50">
54
+ <div class="max-w-4xl mx-auto px-4 py-8">
55
+ <header class="text-center mb-12">
56
+ <h1 class="text-4xl font-bold text-blue-800 mb-4">Attention Mechanisms in Large Language Models</h1>
57
+ <p class="text-xl text-gray-600">Understanding the core innovation behind modern AI language models</p>
58
+ <div class="mt-6">
59
+ <span class="inline-block bg-blue-100 text-blue-800 px-3 py-1 rounded-full text-sm font-medium">Machine Learning</span>
60
+ <span class="inline-block bg-purple-100 text-purple-800 px-3 py-1 rounded-full text-sm font-medium ml-2">Natural Language Processing</span>
61
+ <span class="inline-block bg-green-100 text-green-800 px-3 py-1 rounded-full text-sm font-medium ml-2">Deep Learning</span>
62
+ </div>
63
+ </header>
64
+
65
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
66
+ <div class="p-8">
67
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">Introduction to Attention</h2>
68
+ <p class="text-gray-700 mb-4">
69
+ The attention mechanism is a fundamental component of modern transformer-based language models like GPT, BERT, and others.
70
+ It allows models to dynamically focus on different parts of the input sequence when producing each part of the output sequence.
71
+ </p>
72
+ <p class="text-gray-700 mb-6">
73
+ Unlike traditional sequence models that process inputs in a fixed order, attention mechanisms enable models to learn which parts of the input are most relevant at each step of processing.
74
+ </p>
75
+
76
+ <div class="attention-visual">
77
+ <div class="flex flex-col items-center">
78
+ <div class="flex space-x-8 mb-8">
79
+ <div class="attention-node bg-blue-100 text-blue-800">Input</div>
80
+ <div class="attention-node bg-purple-100 text-purple-800">Q</div>
81
+ <div class="attention-node bg-green-100 text-green-800">K</div>
82
+ <div class="attention-node bg-yellow-100 text-yellow-800">V</div>
83
+ </div>
84
+ <div class="attention-node bg-red-100 text-red-800">Output</div>
85
+ </div>
86
+ </div>
87
+
88
+ <div class="explanation-box">
89
+ <h3 class="font-semibold text-lg text-blue-800 mb-2">Key Insight</h3>
90
+ <p>
91
+ Attention mechanisms compute a weighted sum of values (V), where the weights are determined by the compatibility between queries (Q) and keys (K).
92
+ This allows the model to focus on different parts of the input sequence dynamically.
93
+ </p>
94
+ </div>
95
+ </div>
96
+ </div>
97
+
98
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
99
+ <div class="p-8">
100
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">The Q, K, V Triad</h2>
101
+
102
+ <div class="grid md:grid-cols-3 gap-6 mb-8">
103
+ <div class="bg-blue-50 p-4 rounded-lg">
104
+ <h3 class="font-bold text-blue-800 mb-2"><i class="fas fa-question-circle mr-2"></i>Queries (Q)</h3>
105
+ <p class="text-gray-700">
106
+ Represent what the model is "looking for" at the current position. They are learned representations that help determine which parts of the input to focus on.
107
+ </p>
108
+ </div>
109
+ <div class="bg-green-50 p-4 rounded-lg">
110
+ <h3 class="font-bold text-green-800 mb-2"><i class="fas fa-key mr-2"></i>Keys (K)</h3>
111
+ <p class="text-gray-700">
112
+ Represent what each input element "contains" or "offers". They are compared against queries to determine attention weights.
113
+ </p>
114
+ </div>
115
+ <div class="bg-yellow-50 p-4 rounded-lg">
116
+ <h3 class="font-bold text-yellow-800 mb-2"><i class="fas fa-database mr-2"></i>Values (V)</h3>
117
+ <p class="text-gray-700">
118
+ Contain the actual information that will be aggregated based on the attention weights. They represent what gets passed forward.
119
+ </p>
120
+ </div>
121
+ </div>
122
+
123
+ <h3 class="text-xl font-semibold text-gray-800 mb-4">Why We Need All Three</h3>
124
+ <p class="text-gray-700 mb-4">
125
+ The separation of Q, K, and V provides flexibility and expressive power to the attention mechanism:
126
+ </p>
127
+ <ul class="list-disc pl-6 text-gray-700 space-y-2 mb-6">
128
+ <li><strong>Decoupling:</strong> Allows different representations for what to look for (Q) versus what to retrieve (V)</li>
129
+ <li><strong>Flexibility:</strong> Enables different types of attention patterns (e.g., looking ahead vs. looking back)</li>
130
+ <li><strong>Efficiency:</strong> Permits caching of K and V for autoregressive generation</li>
131
+ <li><strong>Interpretability:</strong> Makes attention patterns more meaningful and analyzable</li>
132
+ </ul>
133
+
134
+ <h3 class="text-xl font-semibold text-gray-800 mb-4">How Q, K, V Are Created</h3>
135
+ <p class="text-gray-700 mb-4">
136
+ In transformer models, Q, K, and V are all derived from the same input sequence through learned linear transformations:
137
+ </p>
138
+
139
+ <div class="code-block mb-6">
140
+ <pre># Python example of creating Q, K, V
141
+ import torch
142
+ import torch.nn as nn
143
+
144
+ # Suppose we have input embeddings of shape (batch_size, seq_len, d_model)
145
+ batch_size = 32
146
+ seq_len = 10
147
+ d_model = 512
148
+ input_embeddings = torch.randn(batch_size, seq_len, d_model)
149
+
150
+ # Create linear projection layers
151
+ q_proj = nn.Linear(d_model, d_model) # Query projection
152
+ k_proj = nn.Linear(d_model, d_model) # Key projection
153
+ v_proj = nn.Linear(d_model, d_model) # Value projection
154
+
155
+ # Project inputs to get Q, K, V
156
+ Q = q_proj(input_embeddings) # Shape: (batch_size, seq_len, d_model)
157
+ K = k_proj(input_embeddings) # Shape: (batch_size, seq_len, d_model)
158
+ V = v_proj(input_embeddings) # Shape: (batch_size, seq_len, d_model)</pre>
159
+ </div>
160
+
161
+ <div class="explanation-box">
162
+ <h3 class="font-semibold text-lg text-blue-800 mb-2">Important Note</h3>
163
+ <p>
164
+ In practice, the dimensions are often split into multiple "heads" (multi-head attention), where each head learns different attention patterns.
165
+ This allows the model to attend to different aspects of the input simultaneously.
166
+ </p>
167
+ </div>
168
+ </div>
169
+ </div>
170
+
171
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
172
+ <div class="p-8">
173
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">Scaled Dot-Product Attention</h2>
174
+
175
+ <p class="text-gray-700 mb-4">
176
+ The core computation in attention mechanisms is the scaled dot-product attention, which can be implemented as follows:
177
+ </p>
178
+
179
+ <div class="code-block mb-6">
180
+ <pre>def scaled_dot_product_attention(Q, K, V, mask=None):
181
+ """
182
+ Q: Query tensor (batch_size, ..., seq_len_q, d_k)
183
+ K: Key tensor (batch_size, ..., seq_len_k, d_k)
184
+ V: Value tensor (batch_size, ..., seq_len_k, d_v)
185
+ mask: Optional mask tensor for masking out certain positions
186
+ """
187
+ # Compute dot products between Q and K
188
+ matmul_qk = torch.matmul(Q, K.transpose(-2, -1)) # (..., seq_len_q, seq_len_k)
189
+
190
+ # Scale by square root of dimension
191
+ d_k = Q.size(-1)
192
+ scaled_attention_logits = matmul_qk / torch.sqrt(torch.tensor(d_k, dtype=torch.float32))
193
+
194
+ # Apply mask if provided (for decoder self-attention)
195
+ if mask is not None:
196
+ scaled_attention_logits += (mask * -1e9)
197
+
198
+ # Softmax to get attention weights
199
+ attention_weights = torch.softmax(scaled_attention_logits, dim=-1)
200
+
201
+ # Multiply weights by values
202
+ output = torch.matmul(attention_weights, V) # (..., seq_len_q, d_v)
203
+
204
+ return output, attention_weights</pre>
205
+ </div>
206
+
207
+ <div class="explanation-box">
208
+ <h3 class="font-semibold text-lg text-blue-800 mb-2">Scaling Explanation</h3>
209
+ <p>
210
+ The scaling factor (√dₖ) is crucial because dot products grow large in magnitude as the dimension increases.
211
+ This can push the softmax function into regions where it has extremely small gradients, making learning difficult.
212
+ Scaling by √dₖ counteracts this effect.
213
+ </p>
214
+ </div>
215
+
216
+ <h3 class="text-xl font-semibold text-gray-800 mb-4">Complete Multi-Head Attention Example</h3>
217
+
218
+ <div class="code-block mb-6">
219
+ <pre>class MultiHeadAttention(nn.Module):
220
+ def __init__(self, d_model, num_heads):
221
+ super(MultiHeadAttention, self).__init__()
222
+ self.num_heads = num_heads
223
+ self.d_model = d_model
224
+ assert d_model % num_heads == 0, "d_model must be divisible by num_heads"
225
+
226
+ self.depth = d_model // num_heads
227
+
228
+ # Linear layers for Q, K, V projections
229
+ self.wq = nn.Linear(d_model, d_model)
230
+ self.wk = nn.Linear(d_model, d_model)
231
+ self.wv = nn.Linear(d_model, d_model)
232
+
233
+ self.dense = nn.Linear(d_model, d_model)
234
+
235
+ def split_heads(self, x, batch_size):
236
+ """Split the last dimension into (num_heads, depth)"""
237
+ x = x.view(batch_size, -1, self.num_heads, self.depth)
238
+ return x.transpose(1, 2) # (batch_size, num_heads, seq_len, depth)
239
+
240
+ def forward(self, q, k, v, mask=None):
241
+ batch_size = q.size(0)
242
+
243
+ # Linear projections
244
+ q = self.wq(q) # (batch_size, seq_len, d_model)
245
+ k = self.wk(k)
246
+ v = self.wv(v)
247
+
248
+ # Split into multiple heads
249
+ q = self.split_heads(q, batch_size)
250
+ k = self.split_heads(k, batch_size)
251
+ v = self.split_heads(v, batch_size)
252
+
253
+ # Scaled dot-product attention
254
+ scaled_attention, attention_weights = scaled_dot_product_attention(q, k, v, mask)
255
+
256
+ # Concatenate heads
257
+ scaled_attention = scaled_attention.transpose(1, 2) # (batch_size, seq_len, num_heads, depth)
258
+ concat_attention = scaled_attention.contiguous().view(batch_size, -1, self.d_model)
259
+
260
+ # Final linear layer
261
+ output = self.dense(concat_attention)
262
+
263
+ return output, attention_weights</pre>
264
+ </div>
265
+ </div>
266
+ </div>
267
+
268
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
269
+ <div class="p-8">
270
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">Types of Attention Patterns</h2>
271
+
272
+ <div class="grid md:grid-cols-2 gap-6 mb-6">
273
+ <div class="bg-indigo-50 p-4 rounded-lg">
274
+ <h3 class="font-bold text-indigo-800 mb-2"><i class="fas fa-arrows-alt-h mr-2"></i>Self-Attention</h3>
275
+ <p class="text-gray-700">
276
+ Q, K, and V all come from the same sequence. Allows each position to attend to all positions in the same sequence.
277
+ </p>
278
+ <div class="mt-3">
279
+ <span class="inline-block bg-indigo-100 text-indigo-800 px-2 py-1 rounded-full text-xs font-medium">Encoder</span>
280
+ <span class="inline-block bg-indigo-100 text-indigo-800 px-2 py-1 rounded-full text-xs font-medium ml-1">BERT</span>
281
+ </div>
282
+ </div>
283
+ <div class="bg-pink-50 p-4 rounded-lg">
284
+ <h3 class="font-bold text-pink-800 mb-2"><i class="fas fa-arrow-right mr-2"></i>Masked Self-Attention</h3>
285
+ <p class="text-gray-700">
286
+ Used in decoder to prevent positions from attending to subsequent positions (autoregressive property).
287
+ </p>
288
+ <div class="mt-3">
289
+ <span class="inline-block bg-pink-100 text-pink-800 px-2 py-1 rounded-full text-xs font-medium">Decoder</span>
290
+ <span class="inline-block bg-pink-100 text-pink-800 px-2 py-1 rounded-full text-xs font-medium ml-1">GPT</span>
291
+ </div>
292
+ </div>
293
+ <div class="bg-teal-50 p-4 rounded-lg">
294
+ <h3 class="font-bold text-teal-800 mb-2"><i class="fas fa-exchange-alt mr-2"></i>Cross-Attention</h3>
295
+ <p class="text-gray-700">
296
+ Q comes from one sequence, while K and V come from another sequence (e.g., encoder-decoder attention).
297
+ </p>
298
+ <div class="mt-3">
299
+ <span class="inline-block bg-teal-100 text-teal-800 px-2 py-1 rounded-full text-xs font-medium">Seq2Seq</span>
300
+ <span class="inline-block bg-teal-100 text-teal-800 px-2 py-1 rounded-full text-xs font-medium ml-1">Translation</span>
301
+ </div>
302
+ </div>
303
+ <div class="bg-orange-50 p-4 rounded-lg">
304
+ <h3 class="font-bold text-orange-800 mb-2"><i class="fas fa-sliders-h mr-2"></i>Sparse Attention</h3>
305
+ <p class="text-gray-700">
306
+ Only attends to a subset of positions to reduce computational complexity (e.g., local, strided, or global attention).
307
+ </p>
308
+ <div class="mt-3">
309
+ <span class="inline-block bg-orange-100 text-orange-800 px-2 py-1 rounded-full text-xs font-medium">Longformer</span>
310
+ <span class="inline-block bg-orange-100 text-orange-800 px-2 py-1 rounded-full text-xs font-medium ml-1">BigBird</span>
311
+ </div>
312
+ </div>
313
+ </div>
314
+ </div>
315
+ </div>
316
+
317
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
318
+ <div class="p-8">
319
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">Key Citations and Resources</h2>
320
+
321
+ <div class="space-y-4">
322
+ <div class="citation">
323
+ <h3 class="font-semibold text-gray-800">1. Vaswani et al. (2017) - Original Transformer Paper</h3>
324
+ <p class="text-gray-600">"Attention Is All You Need" - Introduced the transformer architecture with scaled dot-product attention.</p>
325
+ <a href="https://arxiv.org/abs/1706.03762" class="text-blue-600 hover:underline inline-block mt-1">arXiv:1706.03762</a>
326
+ </div>
327
+
328
+ <div class="citation">
329
+ <h3 class="font-semibold text-gray-800">2. Jurafsky & Martin (2023) - NLP Textbook</h3>
330
+ <p class="text-gray-600">"Speech and Language Processing" - Comprehensive chapter on attention and transformer models.</p>
331
+ <a href="https://web.stanford.edu/~jurafsky/slp3/" class="text-blue-600 hover:underline inline-block mt-1">Stanford NLP Textbook</a>
332
+ </div>
333
+
334
+ <div class="citation">
335
+ <h3 class="font-semibold text-gray-800">3. Illustrated Transformer (Blog Post)</h3>
336
+ <p class="text-gray-600">Jay Alammar's visual explanation of transformer attention mechanisms.</p>
337
+ <a href="https://jalammar.github.io/illustrated-transformer/" class="text-blue-600 hover:underline inline-block mt-1">jalammar.github.io</a>
338
+ </div>
339
+
340
+ <div class="citation">
341
+ <h3 class="font-semibold text-gray-800">4. Harvard NLP (2022) - Annotated Transformer</h3>
342
+ <p class="text-gray-600">Line-by-line implementation guide with PyTorch.</p>
343
+ <a href="http://nlp.seas.harvard.edu/2018/04/03/attention.html" class="text-blue-600 hover:underline inline-block mt-1">Harvard NLP Tutorial</a>
344
+ </div>
345
+
346
+ <div class="citation">
347
+ <h3 class="font-semibold text-gray-800">5. Efficient Transformers Survey (2020)</h3>
348
+ <p class="text-gray-600">Tay et al. review various attention variants for efficiency.</p>
349
+ <a href="https://arxiv.org/abs/2009.06732" class="text-blue-600 hover:underline inline-block mt-1">arXiv:2009.06732</a>
350
+ </div>
351
+ </div>
352
+ </div>
353
+ </div>
354
+
355
+ <div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
356
+ <div class="p-8">
357
+ <h2 class="text-2xl font-bold text-gray-800 mb-6">Practical Considerations</h2>
358
+
359
+ <div class="grid md:grid-cols-2 gap-6">
360
+ <div>
361
+ <h3 class="text-xl font-semibold text-gray-800 mb-3"><i class="fas fa-lightbulb text-yellow-500 mr-2"></i>Tips for Implementation</h3>
362
+ <ul class="list-disc pl-6 text-gray-700 space-y-2">
363
+ <li>Use layer normalization before (not after) attention in transformer blocks</li>
364
+ <li>Initialize attention projections with small random weights</li>
365
+ <li>Monitor attention patterns during training for debugging</li>
366
+ <li>Consider using flash attention for efficiency in production</li>
367
+ <li>Use attention masking carefully for padding and autoregressive generation</li>
368
+ </ul>
369
+ </div>
370
+ <div>
371
+ <h3 class="text-xl font-semibold text-gray-800 mb-3"><i class="fas fa-exclamation-triangle text-red-500 mr-2"></i>Common Pitfalls</h3>
372
+ <ul class="list-disc pl-6 text-gray-700 space-y-2">
373
+ <li>Forgetting to scale attention scores by √dₖ</li>
374
+ <li>Improper handling of attention masks</li>
375
+ <li>Not using residual connections around attention</li>
376
+ <li>Oversized attention heads that don't learn meaningful patterns</li>
377
+ <li>Ignoring attention patterns when debugging model behavior</li>
378
+ </ul>
379
+ </div>
380
+ </div>
381
+ </div>
382
+ </div>
383
+
384
+ <footer class="text-center py-8 text-gray-600">
385
+ <p>© 2023 Understanding Attention Mechanisms in LLMs</p>
386
+ <p class="mt-2">Educational resource for machine learning students</p>
387
+ <div class="mt-4 flex justify-center space-x-4">
388
+ <a href="#" class="text-blue-600 hover:text-blue-800"><i class="fab fa-github fa-lg"></i></a>
389
+ <a href="#" class="text-blue-600 hover:text-blue-800"><i class="fab fa-twitter fa-lg"></i></a>
390
+ <a href="#" class="text-blue-600 hover:text-blue-800"><i class="fab fa-linkedin fa-lg"></i></a>
391
+ </div>
392
+ </footer>
393
+ </div>
394
+ <p style="border-radius: 8px; text-align: center; font-size: 12px; color: #fff; margin-top: 16px;position: fixed; left: 8px; bottom: 8px; z-index: 10; background: rgba(0, 0, 0, 0.8); padding: 4px 8px;">Made with <img src="https://enzostvs-deepsite.hf.space/logo.svg" alt="DeepSite Logo" style="width: 16px; height: 16px; vertical-align: middle;display:inline-block;margin-right:3px;filter:brightness(0) invert(1);"><a href="https://enzostvs-deepsite.hf.space" style="color: #fff;text-decoration: underline;" target="_blank" >DeepSite</a> - 🧬 <a href="https://enzostvs-deepsite.hf.space?remix=ontoligent/ds-5001-text-as-data" style="color: #fff;text-decoration: underline;" target="_blank" >Remix</a></p></body>
395
+ </html>
prompts.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Create a website that explains the attention mechanism used by LLMs. Include examples in Python suitable for teaching undergraduates and a list of authoritative and helpful citations. Go into some detail about why Q, K, and C are needed and how they are created.