Spaces:

SE-Arena
/

Software-Engineering-Arena

Running

zhiminy commited on 20 days ago

Commit

49cb056

1 Parent(s): c1d84c5

add grok-3 and llama-4

Files changed (2) hide show

app.py CHANGED Viewed

@@ -510,7 +510,7 @@ with gr.Blocks() as app:
             # ⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
             ## 📜How It Works
-            - **Blind Comparison**: Submit a SE-related query to two anonymous chatbots randomly selected from up to {len(available_models)} top models, including OpenAI-o3, Grok-2, Gemini-2.0, Claude-3.7, Deepseek-r1, Mistral-large, Llama-3.3, Qwen-2.5, and others.
             - **Interactive Voting**: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
             - **Fair Play Rules**: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.

             # ⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
             ## 📜How It Works
+            - **Blind Comparison**: Submit a SE-related query to two anonymous chatbots randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
             - **Interactive Voting**: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
             - **Fair Play Rules**: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.

context_window.json CHANGED Viewed

@@ -16,10 +16,15 @@
     "gemini-2.0-flash-lite-preview": 1048576,
     "gemini-2.0-pro-exp": 2097152,
     "gemma-3-27b-it": 128000,
-    "grok-2-1212": 131072,
     "llama-3.1-8b": 128000,
     "llama-3.1-405b": 128000,
     "llama-3.3-70b": 128000,
     "mistral-large-latest": 131000,
     "mistral-small-latest": 32000,
     "o1": 128000,
@@ -28,6 +33,5 @@
     "Qwen2.5-32B-Instruct": 131072,
     "Qwen2.5-72B-Instruct": 131072,
     "Qwen2.5-72B-Instruct-128k": 131072,
-    "Qwen2.5-Coder-32B-Instruct": 131072,
-    "yi-large": 32000
 }

     "gemini-2.0-flash-lite-preview": 1048576,
     "gemini-2.0-pro-exp": 2097152,
     "gemma-3-27b-it": 128000,
+    "grok-3-fast-beta": 1000000,
+    "grok-3-beta": 1000000,
+    "grok-3-mini-fast-beta": 1000000,
+    "grok-3-mini-beta": 1000000,
     "llama-3.1-8b": 128000,
     "llama-3.1-405b": 128000,
     "llama-3.3-70b": 128000,
+    "llama4-scout-instruct-basic": 10000000,
+    "llama4-maverick-instruct-basic": 10000000,
     "mistral-large-latest": 131000,
     "mistral-small-latest": 32000,
     "o1": 128000,
     "Qwen2.5-32B-Instruct": 131072,
     "Qwen2.5-72B-Instruct": 131072,
     "Qwen2.5-72B-Instruct-128k": 131072,
+    "Qwen2.5-Coder-32B-Instruct": 131072
 }