Model Benchmark: First Grade Math

Performance comparison across 1,000 questions per model

Performance Overview

Model Benchmark Chart

Models Benchmarked

FlameF0X/MathGPT2
763
Correct
237
Incorrect
FlameF0X/Muffin-2.9b-1C25
9
Correct
991
Incorrect
FlameF0X/MuffinFace-2
8
Correct
992
Incorrect