mihaimasala commited on
Commit
e25ad1d
·
verified ·
1 Parent(s): 5fde02d

Add data collections info

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -13,12 +13,15 @@ We value:
13
  - using public and open corpora
14
  - open-source training and evaluation code.
15
 
16
- In this organization, you can find RoLLM models, based on different underlying models and in different flavours (i.e., foundational, instruct, or chat variants). There are currently two collections:
17
  - RoLlama2: Romanian models based on Llama2
18
  - RoMistral: Romanian models based on Mistral
19
  - RoGemma: Romanian models based on Gemma
20
  - RoLlama3: Romanian models based on Llama3
21
 
 
 
 
22
 
23
  See details in [https://arxiv.org/abs/2406.18266](https://arxiv.org/abs/2406.18266) and [https://arxiv.org/abs/2405.07703](https://arxiv.org/abs/2405.07703).
24
 
 
13
  - using public and open corpora
14
  - open-source training and evaluation code.
15
 
16
+ In this organization, you can find RoLLM models, based on different underlying models and in different flavours (i.e., foundational, instruct, or chat variants). There are currently four model collections:
17
  - RoLlama2: Romanian models based on Llama2
18
  - RoMistral: Romanian models based on Mistral
19
  - RoGemma: Romanian models based on Gemma
20
  - RoLlama3: Romanian models based on Llama3
21
 
22
+ Furthermore, here you can find data that was used for training and evaluation LLMs in Romanian. Currently, there are two data collections:
23
+ - SFT datasets: data used for supervised (instruction) finetuning
24
+ - Evaluation datasets: data used for evaluating LLM in Romanian
25
 
26
  See details in [https://arxiv.org/abs/2406.18266](https://arxiv.org/abs/2406.18266) and [https://arxiv.org/abs/2405.07703](https://arxiv.org/abs/2405.07703).
27