Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -32,4 +32,19 @@ A Gradio web interface for encoding and decoding Telugu text using a trained BPE
|
|
32 |
The tokenizer is trained on a diverse corpus of Telugu text with:
|
33 |
- Maximum vocabulary size: 5000 tokens
|
34 |
- Target compression ratio: ≥ 3.2x
|
35 |
-
- Perfect reconstruction guarantee
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
The tokenizer is trained on a diverse corpus of Telugu text with:
|
33 |
- Maximum vocabulary size: 5000 tokens
|
34 |
- Target compression ratio: ≥ 3.2x
|
35 |
+
- Perfect reconstruction guarantee
|
36 |
+
|
37 |
+
---
|
38 |
+
title: Bpe Tokenizer
|
39 |
+
emoji: 🔥
|
40 |
+
colorFrom: blue
|
41 |
+
colorTo: yellow
|
42 |
+
sdk: gradio
|
43 |
+
sdk_version: 5.12.0
|
44 |
+
app_file: app.py
|
45 |
+
pinned: false
|
46 |
+
license: apache-2.0
|
47 |
+
short_description: Telugu BPE tokenizer with vocabulary of 4800 words.
|
48 |
+
---
|
49 |
+
|
50 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|