119 11 17

Omkar Pangarkar

omkarenator

AI & ML interests

None yet

Recent Activity

upvoted a paper 19 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

liked a dataset 24 days ago

WebOrganizer/Corpus-200B

liked a Space 27 days ago

LLM360/TxT360

View all activity

Organizations

omkarenator's activity

liked a dataset 24 days ago

WebOrganizer/Corpus-200B

Preview • Updated Feb 19 • 143k • 8

liked a Space 27 days ago

110

TxT360: Trillion Extracted Text

📖

Create a large, deduplicated dataset for LLM pre-training

liked a model 2 months ago

mlfoundations/fasttext-oh-eli5

Updated Aug 1, 2024 • 22

liked 2 Spaces 3 months ago

2.56k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

liked a dataset 6 months ago

LLM360/TxT360

Updated about 2 hours ago • 81.7k • 230

liked a Space 8 months ago

936

FineWeb: decanting the web for the finest text data at scale

🍷

Generate high-quality web text data for LLM training

liked a dataset 9 months ago

Trelis/touch-rugby-rules-memorisation

Viewer • Updated Feb 28, 2024 • 363 • 10 • 2

liked a dataset 10 months ago

commoncrawl/statistics

Viewer • Updated 5 days ago • 563k • 481 • 22

liked 6 models over 1 year ago

liked 2 models almost 2 years ago

mosaicml/mpt-7b-chat

Text Generation • Updated Mar 5, 2024 • 84.8k • 514

stanfordnlp/backpack-gpt2

Text Generation • Updated Aug 14, 2023 • 81 • 16