Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
3
32
Sarthak Malhotra
zarmalhotra
Follow
ZennyKenny's profile picture
Akhil-Theerthala's profile picture
davanstrien's profile picture
6 followers
ยท
23 following
sarthakwer
sarthak-malhotra
AI & ML interests
None yet
Recent Activity
liked
a dataset
2 days ago
patrickfleith/instruction-freak-reasoning
reacted
to
ZennyKenny
's
post
with ๐ฅ
3 days ago
When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. ๐ฅ๐ฅ๐ฅ With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod. In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: https://huggingface.co/datasets/ZennyKenny/cosa-benchmark-dataset Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a โค๏ธ if you think it could be useful or hit the Community section with suggestions / critiques.
new
activity
7 days ago
reasoning-datasets-competition/README:
Competition Lobby
View all activity
Organizations
zarmalhotra
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
a
collection
28 days ago
Reasoning Required?
Collection
4 items
โข
Updated
23 days ago
โข
4
upvoted
a
collection
about 1 month ago
OpenThinker2
Collection
3 items
โข
Updated
Apr 7
โข
2