Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: gray | |
colorTo: pink | |
sdk: static | |
pinned: false | |
The goal of the OpenLLM-Ro is to bring together the Romanian community that builds open Romanian models and to collect these models in a single place. | |
We value: | |
- using public and open corpora | |
- open-source training and evaluation code. | |
In this organization, you can find RoLLM models, based on different underlying models and in different flavours (i.e., foundational, instruct, or chat variants). There are currently four model collections: | |
- RoLlama2: Romanian models based on Llama2 | |
- RoMistral: Romanian models based on Mistral | |
- RoLlama3: Romanian models based on Llama3 | |
- RoLlama3.1: Romanian models based on Llama3.1 | |
- RoGemma: Romanian models based on Gemma | |
- RoGemma2: Romanian models based on Gemma2 | |
Furthermore, here you can find data used to train and evaluate LLMs in Romanian. Currently, there are three data collections: | |
- SFT datasets: data used for supervised (instruction) finetuning | |
- Alignment datasets: data used mainly for Direct Preference Optimization (DPO) | |
- Evaluation datasets: data used for evaluating LLM in Romanian | |
See details in [https://arxiv.org/abs/2406.18266](https://arxiv.org/abs/2406.18266) and [https://arxiv.org/abs/2405.07703](https://arxiv.org/abs/2405.07703). | |
- 2025-04-23: we increased the datasets used for supervised finetuning with high-quality data generated using Magpie ([RoMagpie-Reasoning](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_magpie_reasoning) and [RoMagpie-Pro-MT](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_magpie_mt)), and greatly increase the size of the alignment dataset by adding high-quality datasets ([RoUltraFeedback](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_ultrafeedback), [RoMagpie-DPO](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_magpie), [RoArgillaMagpieUltra](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_argilla_magpie) and [RoHelpSteer2](https://huggingface.co/datasets/OpenLLM-Ro/ro_dpo_helpsteer2)) | |
We encourage the community to engage in discussions (to provide feedback, ask questions, or make improvement suggestions) in Hugging Face or GitHub. | |
We will also organize physical meetings (announced in advance) to brainstorm ideas, roadmap, and other technical aspects. | |
Extra info: check also the work by the [Faur AI team](https://huggingface.co/faur-ai) |