Bhadresh Savani's picture

Bhadresh Savani

bhadresh-savani

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

Organizations

Flax Community's profile picture ONNXConfig for all's profile picture HugGAN Community's profile picture Keras Dreambooth Event's profile picture Lambda Go Labs's profile picture

bhadresh-savani's activity

upvoted 3 articles 7 days ago
view article
Article

Welcoming Llama Guard 4 on Hugging Face Hub

31
view article
Article

Tiny Agents: a MCP-powered agent in 50 lines of code

221
view article
Article

How to Build an MCP Server with Gradio

84
upvoted 2 articles 17 days ago
view article
Article

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

53
view article
Article

The NLP Course is becoming the LLM Course!

90
upvoted 2 articles 2 months ago
view article
Article

Hugging Face and JFrog partner to make AI Security more transparent

21
view article
Article

Trace & Evaluate your Agent with Arize Phoenix

38
upvoted an article 3 months ago
view article
Article

How to deploy and fine-tune DeepSeek models on AWS

52
reacted to lin-tan's post with 🔥 6 months ago
view post
Post
1445
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 2 replies
·