Reasoning datasets competition

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

daqc new activity 6 days ago

reasoning-datasets-competition/README:Competition Lobby

zarmalhotra new activity 7 days ago

reasoning-datasets-competition/README:Competition Lobby

zarmalhotra updated a Space 7 days ago

reasoning-datasets-competition/README

View all activity

reasoning-datasets-competition's activity

ZennyKenny

posted an update about 14 hours ago

Post

440

Community! 💡💡💡

It's the last day to submit your datasets for the Reasoning Datasets Competition: https://www.bespokelabs.ai/blog/reasoning-datasets-competition

Here are my submissions:
- ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset
- ZennyKenny/cosa-benchmark-dataset
- ZennyKenny/tactical-military-reasoning-v.1.0
- ZennyKenny/tron-dataset-v.1.0

Have a look and drop a ❤️ or comment! Check out the entire collection of submissions here: https://huggingface.co/datasets?other=reasoning-datasets-competition

davidberenstein1957

posted an update 1 day ago

Post

748

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

https://huggingface.co/blog/davidberenstein1957/phare-analysis-of-hallucination-in-leading-llms

ZennyKenny

posted an update 3 days ago

Post

3017

After hearing the news that Marc Andreessen thinks that the only job that is safe from AI replacement is venture capital: https://gizmodo.com/marc-andreessen-says-one-job-is-mostly-safe-from-ai-venture-capitalist-2000596506 🧠🧠🧠

The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset 🔥🔥🔥

Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! 💰💰💰

ZennyKenny

posted an update 4 days ago

Post

3290

When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.

2 replies

·

daqc

in reasoning-datasets-competition/README 6 days ago

Competition Lobby

#1 opened 29 days ago by

zarmalhotra

in reasoning-datasets-competition/README 7 days ago

Competition Lobby

#1 opened 29 days ago by

zarmalhotra

updated a Space 7 days ago

README

ZennyKenny

posted an update 9 days ago

Post

1349

The same way the advent of Adobe Illustrator has led to innovation in the way that creative professionals work, I earnestly believe that AI will do the same (contrary to the popular opinion that it represents some regression in the world of creatives).

@natalika and I were speaking about this topic and like most illustrators she has some understandable concerns about the spread of AI in her field. She also told me how much time she spends generating concept art that will never see the light of day in >98% of cases. 💡

To me, that sounded like a perfect opportunity to leverage image diffusion in a way that helps artists spend more time creating cool stuff rather than just malevolently mining their work and using it without credit. Using the Black Forest Labs base model FLUX, Replicate, and about $5 of H100 compute, I post-trained a LoRA adapter on a set of her images associated with one project she's working on and spun up an app with Hugging Face Spaces (and Zero GPU for the win).

I give you, Natalie Diffusion: ZennyKenny/natalie-diffusion

Now, generating concept art in her particular style takes seconds instead of hours and when it's time to put the work into production, a human designer is still invaluable. And building it in the open hopefully inspires other use cases amongst other designers. 🖖

2 replies

·

ZennyKenny

posted an update 10 days ago

Post

2709

I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.

The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.

Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

geeknik

in reasoning-datasets-competition/README 11 days ago

Competition Lobby

#1 opened 29 days ago by

marcodsn

in reasoning-datasets-competition/README 11 days ago

Competition Lobby

#1 opened 29 days ago by

ZennyKenny

posted an update 13 days ago

Post

553

Phew, maybe a little dark, but I've submitted my second dataset to the Reasoning Datasets Competition: ZennyKenny/tactical-military-reasoning-v.1.0

I'd be interested to hear the community's thoughts on the applications of AI in the military. Especially in the wargaming space.

This is something that feels inevitable (and realistically, probably already in progress). Doesn't it make sense for us to have an understanding of the mechanics of such processes? Surely they will never be open source.

9 replies

·

davidberenstein1957

posted an update 15 days ago

Post

2171

🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!

Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.

Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint

davanstrien

posted an update 16 days ago

Post

1959

Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)

davidberenstein1957

posted an update 21 days ago

Post

1701

🧑‍🏫 I wrote a brief blogpost to give An Introduction to AI Model Optimization Techniques!

URL: https://huggingface.co/blog/PrunaAI/introduction-to-ai-model-optimization-techniques

ZennyKenny

posted an update 22 days ago

Post

1431

Submitted my first dataset for the Reasoning Datasets Competition! https://huggingface.co/datasets/ZennyKenny/TRON-dataset-v.1.0

This dataset is designed to post-train Metareasoning agents, or those agents whose job it is to quickly (and importantly, cheaply) reason through whether it makes sense to launch a full reasoning job or simply use a simple completions job.

There's still plenty of time to join the competition! https://www.bespokelabs.ai/blog/reasoning-datasets-competition

Generation notebook (linked in dataset) is open source and pretty well generalized if I don't say so myself, so you can use it to make your own Metareasoning datasets.

Shoutout to @onekq for his inspiring comment on this topic.

davanstrien

updated a Space 22 days ago

README

davidberenstein1957

posted an update 22 days ago

Post

1382

RealHarm: A Collection of Real-World Language Model Application Failure

I'm David from Giskard, and we work on securing your Agents.
Today, we are launching RealHarm: a dataset of real-world problematic interactions with AI agents, drawn from publicly reported incidents.

Check out the dataset and paper: https://realharm.giskard.ai/

Akhil-Theerthala

in reasoning-datasets-competition/README 23 days ago

Competition Lobby

#1 opened 29 days ago by

davanstrien

in reasoning-datasets-competition/README 23 days ago

Competition Lobby

#1 opened 29 days ago by