Snowflake
/

Qwen-2.5-coder-Arctic-ExCoT-32B

Model card Files Files and versions Community

Qwen-2.5-coder-Arctic-ExCoT-32B / README.md

jeffra's picture

Update README.md

d3289d1 verified about 1 month ago

|

history blame contribute delete

2.88 kB

	---
	license: cc-by-nc-4.0
	base_model:
	- Qwen/Qwen2.5-Coder-32B
	---

	# Arctic Text2SQL: ExCoT

	Snowflake’s AI research team introduces ExCoT, the first model in the Arctic Text2SQL family. ExCoT is a novel framework that combines CoT prompting with SQL execution-based DPO, using execution results — not human preferences — as the feedback signal. This enables scalable, high-quality model optimization without requiring expensive human annotations.

	Based on our internal testing, ExCoT delivered state-of-the-art results on the [BIRD-test benchmark](https://bird-bench.github.io/), achieving best-in-class performance in the single-model, single-inference category using only public datasets (BIRD and Spider) and no additional Text2SQL data:

	* [Llama-3.1-Arctic-ExCoT-70B](https://huggingface.co/Snowflake/Llama-3.1-Arctic-ExCoT-70B) improved execution accuracy on the BIRD-dev set from the base model’s 57.37% to 68.51%. [Qwen-2.5-coder-Arctic-ExCoT-32B](https://huggingface.co/Snowflake/Qwen-2.5-coder-Arctic-ExCoT-32B) achieved similarly strong gains.

	* Both models significantly outperformed other well-known frontier general-purpose models, achieving over 10 points of improvement.

	For more details about ExCoT and how to use it:

	* ❄️ [Arctic Text2SQL: Introducing ExCoT for Execution-Guided Chain-of-Thought Optimization (blog)]()
	* 📝 [ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback (arxiv)](https://arxiv.org/pdf/2503.19988)
	* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/main/projects/excot_dpo)

	## Evaluation results

	\| Model \| \| \|
	\|--------------------------------------\|----------------\|--------------\|
	\| \| BIRD Ex% Dev \| BIRD Ex% Test \|
	\| Arctic-ExCoT-70B (LLaMA 3.1 70B) \| 68.51 \| 68.53 \|
	\| Arctic-ExCoT-32B (Qwen-2.5-Coder 32B) \| 68.25 \| 68.19 \|
	\| XiYanSQL-QwenCoder* \| 67.01 \| 69.03 \|
	\| OpenAI GPT-4o \| 54.04 \| – \|
	\| OpenAI GPT-4 \| 46.35 \| 54.89 \|
	\| Anthropic Claude 3.5-Sonnet \| 50.13 \| – \|
	\| Claude-2 \| 42.70 \| 49.02 \|
	\| OpenAI o1-mini \| 52.41 \| – \|
	\| OpenAI o3-mini \| 53.72 \| – \|
	\| Mistral-large-2407 (123B) \| 53.52 \| 55.84 \|
	\| DeepSeek-V2 (236B) \| 56.13 \| 56.68 \|

	Top Single-Model, Single-Inference Results on the BIRD Leaderboard (as of March 25, 2025). *XiYanSQL-QwenCoder: there are some challenges to reproduce the numbers [[1]](https://github.com/XGenerationLab/XiYanSQL-QwenCoder/issues/4)[[2]](https://modelscope.cn/models/XGenerationLab/XiYanSQL-QwenCoder-32B-2412/feedback/issueDetail/22708).