File size: 2,372 Bytes
5d879c6
 
 
 
 
 
 
 
 
c8af244
8edba58
c8af244
 
 
8edba58
e25ad1d
c8af244
a4ca6da
088cd9c
8dbc9bd
 
 
 
088cd9c
8dbc9bd
e25ad1d
8dbc9bd
e25ad1d
6185214
c43904d
 
2dac433
a4ca6da
ebece27
 
2528892
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
title: README
emoji: πŸŒ–
colorFrom: gray
colorTo: pink
sdk: static
pinned: false
---

The goal of the OpenLLM-Ro is to bring together the Romanian community that builds open Romanian models and to collect these models in a single place. 

We value:
- using public and open corpora
- open-source training and evaluation code.

In this organization, you can find RoLLM models, based on different underlying models and in different flavours (i.e., foundational, instruct, or chat variants). There are currently four model collections:
- RoLlama2: Romanian models based on Llama2
- RoMistral: Romanian models based on Mistral
- RoLlama3: Romanian models based on Llama3
- RoLlama3.1: Romanian models based on Llama3.1
- RoGemma: Romanian models based on Gemma
- RoGemma2: Romanian models based on Gemma2


Furthermore, here you can find data used to train and evaluate LLMs in Romanian. Currently, there are three data collections:
- SFT datasets: data used for supervised (instruction) finetuning
- Alignment datasets: data used mainly for Direct Preference Optimization (DPO)
- Evaluation datasets: data used for evaluating LLM in Romanian

See details in [https://arxiv.org/abs/2406.18266](https://arxiv.org/abs/2406.18266) and [https://arxiv.org/abs/2405.07703](https://arxiv.org/abs/2405.07703).

- 2025-04-23: we increased the datasets used for supervised finetuning with high-quality data generated using Magpie ([RoMagpie-Reasoning](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_magpie_reasoning) and [RoMagpie-Pro-MT](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_magpie_mt)), and greatly increase the size of the alignment dataset by adding high-quality datasets ([RoUltraFeedback](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_ultrafeedback), [RoMagpie-DPO](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_magpie), [RoArgillaMagpieUltra](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_argilla_magpie) and [RoHelpSteer2](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_helpsteer2))

We encourage the community to engage in discussions (to provide feedback, ask questions, or make improvement suggestions) in Hugging Face or GitHub. 

We will also organize physical meetings (announced in advance) to brainstorm ideas, roadmap, and other technical aspects.

Extra info: check also the work by the [Faur AI team](https://huggingface.co/faur-ai)