Spaces:
Running
Running
Constantin Orasan
commited on
Commit
路
a514775
1
Parent(s):
538dc0d
Updated the app and the models
Browse files- app.py +16 -3
- bpe-ECB.model +0 -0
- bpe-EMEA.model +0 -0
app.py
CHANGED
@@ -5,7 +5,11 @@ examples = [
|
|
5 |
"Hello, world!",
|
6 |
"European Central bank has announced cuts.",
|
7 |
"This document is a summary of the European Public Assessment Report (EPAR).",
|
8 |
-
"En el presente documento se resume el Informe P煤blico Europeo de Evaluaci贸n (EPAR)."
|
|
|
|
|
|
|
|
|
9 |
|
10 |
|
11 |
def greet(sentence):
|
@@ -25,9 +29,18 @@ def greet(sentence):
|
|
25 |
"</div>")
|
26 |
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
demo = gr.Interface(fn=greet, inputs="text", outputs="html",
|
29 |
-
examples=examples, title="SentencePiece
|
30 |
-
description=
|
31 |
cache_examples="lazy",
|
32 |
concurrency_limit=30,
|
33 |
css=".output {font-size: 150%;}")
|
|
|
5 |
"Hello, world!",
|
6 |
"European Central bank has announced cuts.",
|
7 |
"This document is a summary of the European Public Assessment Report (EPAR).",
|
8 |
+
"En el presente documento se resume el Informe P煤blico Europeo de Evaluaci贸n (EPAR).",
|
9 |
+
"Solution for injection",
|
10 |
+
"How is Abilify used?",
|
11 |
+
"驴Para qu茅 se utiliza Abilify?",
|
12 |
+
"Tratado de la Uni贸n Europea y Tratado de Funcionamiento de la Uni贸n Europea"]
|
13 |
|
14 |
|
15 |
def greet(sentence):
|
|
|
29 |
"</div>")
|
30 |
|
31 |
|
32 |
+
description = """
|
33 |
+
Demo for SentencePiece. The model is trained on ECB and EMEA datasets in order to see the differences in tokenization.
|
34 |
+
The ECB dataset contains financial news articles, while the EMEA dataset contains medical articles.
|
35 |
+
The texts included in the training are in English and Spanish, for this reason the tokenisation will work best for these languages.
|
36 |
+
You can try some other languages and see how the tokenisation works. However, make sure you use only Latin characters.
|
37 |
+
The model did not see any non-Latin characters during training, so the results for languages that do not use Latin characters will be unpredictable.
|
38 |
+
Both variants are trained with 5000 vocab size.
|
39 |
+
"""
|
40 |
+
|
41 |
demo = gr.Interface(fn=greet, inputs="text", outputs="html",
|
42 |
+
examples=examples, title="SentencePiece",
|
43 |
+
description=description,
|
44 |
cache_examples="lazy",
|
45 |
concurrency_limit=30,
|
46 |
css=".output {font-size: 150%;}")
|
bpe-ECB.model
CHANGED
Binary files a/bpe-ECB.model and b/bpe-ECB.model differ
|
|
bpe-EMEA.model
CHANGED
Binary files a/bpe-EMEA.model and b/bpe-EMEA.model differ
|
|