Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation Paper • 2504.17025 • Published 15 days ago • 16
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 7
Word Sense Linking: Disambiguating Outside the Sandbox Paper • 2412.09370 • Published Dec 12, 2024 • 9
ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget Paper • 2408.00103 • Published Jul 31, 2024 • 23
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 131
ZEBRA Collection Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering • 12 items • Updated Dec 4, 2024 • 9
ITA-Bench: Italian Benchmarks for LLMs Collection A collection of Italian benchmarks for Large Language Models. See also our Github repo: https://github.com/SapienzaNLP/ita-bench • 19 items • Updated Dec 4, 2024 • 6
🇮🇹 Italian NLP Resources Collection Collection of models, datasets and demos relevant to Italian NLP 🇮🇹 • 288 items • Updated 4 days ago • 26