BigLAM: BigScience Libraries, Archives and Museums

non-profit

https://github.com/bigscience-workshop/lam

Activity Feed Request to join this org

AI & ML interests

🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

Recent Activity

christopher new activity about 10 hours ago

biglam/loc_beyond_words:[bot] Conversion to Parquet

davanstrien published a model about 13 hours ago

biglam/historic-newspaper-illustrations-yolov11

davanstrien new activity about 13 hours ago

biglam/historic-newspaper-illustrations-yolov11:Update README.md

View all activity

biglam's activity

christopher

in biglam/loc_beyond_words about 10 hours ago

[bot] Conversion to Parquet

#4 opened about 22 hours ago by

parquet-converter

davanstrien

published a model about 13 hours ago

biglam/historic-newspaper-illustrations-yolov11

Object Detection • Updated about 13 hours ago • 5

davanstrien

in biglam/historic-newspaper-illustrations-yolov11 about 13 hours ago

Update README.md

#2 opened about 13 hours ago by

davanstrien

updated a collection about 14 hours ago

Historic Newsaper Datasets

Collection

Historic Newspaper Datasets on the Hub • 16 items • Updated about 14 hours ago • 5

davanstrien

updated a Space about 15 hours ago

README

📚

davanstrien

in biglam/README about 15 hours ago

make shorter

#2 opened about 15 hours ago by

davanstrien

Update README.md

#1 opened about 15 hours ago by

davanstrien

in biglam/loc_beyond_words 1 day ago

switch to parquet version of dataset

#3 opened 1 day ago by

davanstrien

Upload dataset

#2 opened 1 day ago by

davanstrien

Update README.md

#1 opened 1 day ago by

davanstrien

updated a dataset 6 days ago

biglam/europeana_newspapers

Viewer • Updated 6 days ago • 11.9M • 1.03k • 44

davanstrien

in biglam/europeana_newspapers 6 days ago

better README

#2 opened 6 days ago by

davanstrien

alielfilali01

posted an update 8 days ago

Post

449

Great efforts from @AtlasIA folks to adapt text2image models (ghibli style) for Moroccan Context

Read the blog is here : https://huggingface.co/blog/atlasia/creating-your-custom-ghibli-text-to-image-model

davanstrien

posted an update 16 days ago

Post

1959

Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)

albertvillanova

posted an update 17 days ago

Post

2497

smolagents v1.14.0 is out! 🚀
🔌 MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
🪨 Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. 💼

Full release 👉 https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI

gigant

authored 2 papers 23 days ago

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Paper • 2206.15076 • Published Jun 30, 2022 • 4

Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure

Paper • 2504.10049 • Published 25 days ago • 3

davanstrien

posted an update 29 days ago

Post

1662

I've created a v1 dataset ( davanstrien/reasoning-required) and model ( davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.

- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions

My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.

This significantly reduces computation costs while expanding reasoning dataset domain coverage.

AI & ML interests

Recent Activity

Team members 55

biglam's activity

[bot] Conversion to Parquet

Update README.md

README

make shorter

Update README.md

switch to parquet version of dataset

Upload dataset

Update README.md

better README