PleIAs

company
Activity Feed

AI & ML interests

Open Science LLMs

Recent Activity

Pclanglais  updated a model about 8 hours ago
PleIAs/Pleias-RAG-350M
Pclanglais  updated a dataset 1 day ago
PleIAs/New-RAG-Evals
Pclanglais  published a dataset 2 days ago
PleIAs/New-RAG-Evals
View all activity

PleIAs's activity

davanstrien 
posted an update 16 days ago
view post
Post
1960
Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)