458 6 323

Zhimin Zhao PRO

zhiminy

https://zhimin-z.github.io

AI & ML interests

SE4AI, AI4SE, LLMOps, LLM4Code

Recent Activity

updated a Space 3 days ago

SE-Arena/Software-Engineering-Arena

updated a dataset 4 days ago

SE-Arena/conversations

updated a dataset 4 days ago

SE-Arena/votes

View all activity

Organizations

Posts 4

Post

1877

# 🚀 SE Arena: Evaluating Foundation Models for Software Engineering

SE Arena is the first open-source platform for evaluating foundation models in real-world software engineering workflows.

## What makes it unique?

- RepoChat: Automatically injects repository context (issues, commits, PRs) into conversations for more realistic evaluations
- Multi-round interactions: Tests models through iterative workflows, not just single prompts
- Novel metrics: Includes a "model consistency score" that measures model determinism through self-play matches and "conversation efficiency index" that evaluates model performance while accounting for the number of interaction rounds required to reach conclusions.

Try it now: SE-Arena/Software-Engineering-Arena

## Why it matters

Traditional evaluation frameworks don't capture how developers actually use models in their daily work. SE Arena creates a testing environment that mirrors real engineering workflows, helping you choose the right model for your specific software development needs.

From debugging to requirement refinement, see which models truly excel at software engineering tasks!

Post

2260

Hey everyone,

We're thrilled to introduce our latest project: a hand-curated list of the best production-level machine-learning open-source toolkits! 🚀

Check it out here: zhiminy/Awesome-Production-Machine-Learning-Search

View all Posts