CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Paper • 2504.15254 • Published 17 days ago • 6
TAUR-Lab/Taur_CoT_Analysis_Project___deepseek-ai__DeepSeek-R1-Distill-Llama-70B Viewer • Updated Feb 17 • 300 • 9
TAUR-Lab/Taur_CoT_Analysis_Project___meta-llama__Llama-3.3-70B-Instruct Viewer • Updated Feb 17 • 2.5k • 9
TAUR-Lab/Taur_CoT_Analysis_Project___deepseek-ai__DeepSeek-R1-Distill-Llama-70B Viewer • Updated Feb 17 • 300 • 9
TAUR-Lab/Taur_CoT_Analysis_Project___meta-llama__Llama-3.3-70B-Instruct Viewer • Updated Feb 17 • 2.5k • 9
TAUR-Lab/Taur_CoT_Analysis_Project___internlm__internlm2_5-7b-chat Viewer • Updated Dec 21, 2024 • 63.7k • 8
TAUR-Lab/Taur_CoT_Analysis_Project___OpenGVLab__InternVL2_5-8B Viewer • Updated Dec 21, 2024 • 63.7k • 8
Learning to Refine with Fine-Grained Natural Language Feedback Paper • 2407.02397 • Published Jul 2, 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Paper • 2102.01672 • Published Feb 2, 2021
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation Paper • 2112.02721 • Published Dec 6, 2021
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs Paper • 2309.08873 • Published Sep 16, 2023
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
Learning to Refine with Fine-Grained Natural Language Feedback Paper • 2407.02397 • Published Jul 2, 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents Paper • 2404.10774 • Published Apr 16, 2024 • 3