Grade School Math 8K
一个包含 8500 道小学数学题的基准,用于评估模型的数学推理能力。
模型简称 | 得分 | 发布机构 | 发布时间 | 参数规模(亿) |
---|---|---|---|---|
Claude3-Opus | 95.0 |
![]() |
2024-03-04 | 0.0 |
Qwen2.5-Max | 94.5 |
![]() |
2025-01-28 | 未知 |
Qwen2.5-72B | 91.5 |
![]() |
2024-09-18 | 727.0 |
GPT-4o mini | 91.3 |
![]() |
2024-07-18 | 0.0 |
Phi-4-mini-instruct (3.8B) | 88.6 |
![]() |
2025-02-27 | 38.0 |
Qwen2.5-7B | 85.4 |
![]() |
2024-09-18 | 70.0 |
Llama3.1-8B-Instruct | 82.4 |
![]() |
2024-07-23 | 80.0 |
Qwen2.5-3B | 79.1 |
![]() |
2024-09-18 | 30.0 |
Moonlight-16B-A3B-Instruct | 77.4 |
![]() |
2025-02-23 | 160.0 |
Gemma 2 - 9B | 70.7 |
![]() |
2024-06-27 | 90.0 |
Llama3.1-8B | 55.3 |
![]() |
2024-07-23 | 80.0 |
Mistral-7B-Instruct-v0.3 | 36.2 |
![]() |
2024-05-22 | 70.0 |
Llama-3.2-3B | 34.0 |
![]() |
2024-09-18 | 32.0 |
Amazon Nova Pro | 0.0 |
![]() |
2024-12-03 | 未知 |
Gemini 1.5 Pro | 0.0 |
![]() |
2024-02-15 | 0.0 |
Llama3.1-405B Instruct | 0.0 |
![]() |
2024-07-23 | 4050.0 |