zero-gpu-explorers (ZeroGPU Explorers)

ZennyKenny

posted an update 2 days ago

Post

2667

After hearing the news that Marc Andreessen thinks that the only job that is safe from AI replacement is venture capital: https://gizmodo.com/marc-andreessen-says-one-job-is-mostly-safe-from-ai-venture-capitalist-2000596506 🧠🧠🧠

The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset 🔥🔥🔥

Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! 💰💰💰

ZennyKenny

posted an update 4 days ago

Post

3259

When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.

2 replies

·

ZennyKenny

posted an update 8 days ago

Post

1347

The same way the advent of Adobe Illustrator has led to innovation in the way that creative professionals work, I earnestly believe that AI will do the same (contrary to the popular opinion that it represents some regression in the world of creatives).

@natalika and I were speaking about this topic and like most illustrators she has some understandable concerns about the spread of AI in her field. She also told me how much time she spends generating concept art that will never see the light of day in >98% of cases. 💡

To me, that sounded like a perfect opportunity to leverage image diffusion in a way that helps artists spend more time creating cool stuff rather than just malevolently mining their work and using it without credit. Using the Black Forest Labs base model FLUX, Replicate, and about $5 of H100 compute, I post-trained a LoRA adapter on a set of her images associated with one project she's working on and spun up an app with Hugging Face Spaces (and Zero GPU for the win).

I give you, Natalie Diffusion: ZennyKenny/natalie-diffusion

Now, generating concept art in her particular style takes seconds instead of hours and when it's time to put the work into production, a human designer is still invaluable. And building it in the open hopefully inspires other use cases amongst other designers. 🖖

2 replies

·

ZennyKenny

in zero-gpu-explorers/README 9 days ago

Updated with latest GPU

2

8

#163 opened 9 days ago by

michellehbn

cbensimon

in zero-gpu-explorers/README 9 days ago

Updated with latest GPU

2

8

#163 opened 9 days ago by

michellehbn

cbensimon

updated a Space 9 days ago

97

README

🌍

ZennyKenny

posted an update 9 days ago

Post

2706

I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.

The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.

Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

ZennyKenny

posted an update 13 days ago

Post

552

Phew, maybe a little dark, but I've submitted my second dataset to the Reasoning Datasets Competition: ZennyKenny/tactical-military-reasoning-v.1.0

I'd be interested to hear the community's thoughts on the applications of AI in the military. Especially in the wargaming space.

This is something that feels inevitable (and realistically, probably already in progress). Doesn't it make sense for us to have an understanding of the mechanics of such processes? Surely they will never be open source.

9 replies

·

victor

posted an update 14 days ago

Post

3126

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

meg

posted an update 15 days ago

Post

2383

New launch: See the energy use of chatbot conversations, in real time. =)
jdelavande/chat-ui-energy
Great work from @JulienDelavande !

dreamerdeo

authored a paper 16 days ago

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published 16 days ago • 46

dreamerdeo

authored a paper 20 days ago

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published 20 days ago • 19

ZennyKenny

posted an update 21 days ago

Post

1431

Submitted my first dataset for the Reasoning Datasets Competition! https://huggingface.co/datasets/ZennyKenny/TRON-dataset-v.1.0

This dataset is designed to post-train Metareasoning agents, or those agents whose job it is to quickly (and importantly, cheaply) reason through whether it makes sense to launch a full reasoning job or simply use a simple completions job.

There's still plenty of time to join the competition! https://www.bespokelabs.ai/blog/reasoning-datasets-competition

Generation notebook (linked in dataset) is open source and pretty well generalized if I don't say so myself, so you can use it to make your own Metareasoning datasets.

Shoutout to @onekq for his inspiring comment on this topic.

dreamerdeo

authored 2 papers 22 days ago

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Paper • 2412.11757 • Published Dec 16, 2024

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published 23 days ago • 13

ZennyKenny

posted an update 28 days ago

Post

2780

Just signed up for the Reasoning Datasets Competition from Hugging Face, Together AI, and Bespoke Labs!

Looking forward to seeing what the community comes up with to help train better reasoning models.

Join the fray: https://www.bespokelabs.ai/blog/reasoning-datasets-competition

4 replies

·

BestWishYsh

authored a paper about 1 month ago

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published Apr 3 • 56

BestWishYsh

posted an update about 1 month ago

Post

2685

🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨

There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇

Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)

ZennyKenny

posted an update about 1 month ago

Post

2131

A few new Russian-language synthetic datasets. The labelling is good, but some of the syntax and grammar is not great.

Great for Russian-language classification models, probably not great for fine-tuning Russian-langauge text generation.

- Virtual Assistant Query / Responses: ZennyKenny/ru_virtual_assistant_chatgpt_distill
- LLM Query / Responses: ZennyKenny/russian_llm_response_chatgpt_distill

Crazy how much language drift is still an issue, especially given that Russian constitutes nearly 5% of the content on the internet.

ZennyKenny

posted an update about 1 month ago

Post

1939

Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org:

yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!

2 replies

·

ZeroGPU Explorers

AI & ML interests

Recent Activity

zero-gpu-explorers's activity

Updated with latest GPU

Updated with latest GPU

README

FlowReasoner: Reinforcing Query-Level Meta-Agents

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Efficient Process Reward Model Training via Active Learning

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

AI & ML interests

Recent Activity

Team members 753

zero-gpu-explorers's activity

Updated with latest GPU

Updated with latest GPU

README