Aurora-M

community

https://aurora-lm.github.io/posts/about-us/

Activity Feed Request to join this org

AI & ML interests

Omni Lingual Models

Recent Activity

huu-ontocord updated a Space 7 days ago

aurora-m/README

Muennighoff authored a paper 9 days ago

ReasonIR: Training Retrievers for Reasoning Tasks

Ziyang authored a paper 9 days ago

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

View all activity

aurora-m's activity

huu-ontocord

updated a Space 7 days ago

README

Muennighoff

authored a paper 9 days ago

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published 10 days ago • 50

Ziyang

authored a paper 9 days ago

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

Paper • 2411.18932 • Published Nov 28, 2024 • 1

gpucce

authored a paper 10 days ago

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Paper • 2504.17025 • Published 15 days ago • 16

huu-ontocord

updated a dataset 26 days ago

aurora-m/redteam

Viewer • Updated 26 days ago • 6.07k • 57 • 7

huu-ontocord

updated a model 26 days ago

aurora-m/aurora-m-biden-harris-redteamed

Text Generation • Updated 26 days ago • 17 • 20

huu-ontocord

updated 2 datasets 26 days ago

aurora-m/aurora-m-dataset-part-2

Viewer • Updated 26 days ago • 116M • 806

aurora-m/aurora-m-dataset-part-1

Viewer • Updated 26 days ago • 122M • 122

ajibawa-2023

posted an update 29 days ago

Post

3977

Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.

3 replies

·

Taishi-N324

authored a paper about 1 month ago

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Paper • 2503.23714 • Published Mar 31

HoangHa

authored a paper about 2 months ago

Pensez: Less Data, Better Reasoning -- Rethinking French LLM

Paper • 2503.13661 • Published Mar 17 • 5

Taishi-N324

authored 2 papers 2 months ago

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Paper • 2411.08719 • Published Nov 10, 2024

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Paper • 2412.14471 • Published Dec 19, 2024

mayank-mishra

authored a paper 2 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

Taishi-N324

authored a paper 2 months ago

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Paper • 2503.04412 • Published Mar 6 • 1

terryyz

authored a paper 2 months ago

CodeArena: A Collective Evaluation Platform for LLM Code Generation

Paper • 2503.01295 • Published Mar 3 • 8

vumichien

updated a dataset 2 months ago

ontocord/MixtureVitae-curiosite-subset

Viewer • Updated 24 days ago • 713k • 363 • 2

huu-ontocord

authored 3 papers 2 months ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 56

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Paper • 2412.15035 • Published Dec 19, 2024 • 4

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published Feb 26 • 19