Add license and tag metadata
Browse files
README.md
CHANGED
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Flash Attention
|
2 |
|
3 |
Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
|
@@ -65,6 +71,7 @@ print(f"Output: {out_kv.shape}")
|
|
65 |
```
|
66 |
|
67 |
expected output
|
|
|
68 |
```txt
|
69 |
Fetching 3 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 16384.00it/s]
|
70 |
Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
|
@@ -77,4 +84,5 @@ Output: torch.Size([10, 4, 8])
|
|
77 |
|
78 |
3. KV-cache:
|
79 |
Output: torch.Size([2, 2, 4, 8])
|
80 |
-
```
|
|
|
|
1 |
+
---
|
2 |
+
license: bsd-3-clause
|
3 |
+
tags:
|
4 |
+
- kernel
|
5 |
+
---
|
6 |
+
|
7 |
# Flash Attention
|
8 |
|
9 |
Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
|
|
|
71 |
```
|
72 |
|
73 |
expected output
|
74 |
+
|
75 |
```txt
|
76 |
Fetching 3 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 16384.00it/s]
|
77 |
Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
|
|
|
84 |
|
85 |
3. KV-cache:
|
86 |
Output: torch.Size([2, 2, 4, 8])
|
87 |
+
```
|
88 |
+
|