ReasonIR: Training Retrievers for Reasoning Tasks Paper • 2504.20595 • Published 10 days ago • 50
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024 • 1
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation Paper • 2504.17025 • Published 15 days ago • 16
view post Post 3977 Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-CollectionFirst Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format. See translation 3 replies · 👍 8 8 🔥 5 5 🚀 1 1 ❤️ 1 1 + Reply
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models Paper • 2503.23714 • Published Mar 31
Pensez: Less Data, Better Reasoning -- Rethinking French LLM Paper • 2503.13661 • Published Mar 17 • 5
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs Paper • 2411.08719 • Published Nov 10, 2024
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs Paper • 2412.14471 • Published Dec 19, 2024
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published Feb 14
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published Mar 6 • 1
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published Mar 3 • 8
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published Feb 26 • 19