Massive Multitask Language Understanding
一个涵盖 57 个主题的多项选择题基准,用于评估大规模语言模型的知识和推理能力。
15000
University of California, Berkeley
知识问答
Accuracy
模型简称 | 得分 | 发布机构 | 发布时间 | 参数规模(亿) |
---|---|---|---|---|
OpenAI o1 | 91.8 |
![]() |
2024-12-05 | 未知 |
DeepSeek-R1 | 90.8 |
![]() |
2025-01-20 | 6710.0 |
Hunyuan-TurboS | 89.5 |
![]() |
2025-03-10 | 未知 |
GPT-4o | 88.7 |
![]() |
2024-05-13 | 未知 |
Llama3.1-405B Instruct | 88.6 |
![]() |
2024-07-23 | 4050.0 |
DeepSeek-V3 | 88.5 |
![]() |
2024-12-26 | 6810.0 |
Claude 3.5 Sonnet | 88.3 |
![]() |
2024-06-21 | 未知 |
Claude 3.5 Sonnet New | 88.3 |
![]() |
2024-10-22 | 0.0 |
Qwen2.5-Max | 87.9 |
![]() |
2025-01-28 | 未知 |
GPT-4.1 mini | 87.5 |
![]() |
2025-04-14 | 未知 |
Grok 2 | 87.5 |
|
2024-08-13 | 未知 |
Kimi k1.5 (Short-CoT) | 87.4 |
![]() |
2025-01-22 | 未知 |
Gemini 1.5 Pro | 87.1 |
![]() |
2024-02-15 | 0.0 |
OpenAI o3-mini (high) | 86.9 |
![]() |
2025-01-31 | 未知 |
Claude3-Opus | 86.8 |
![]() |
2024-03-04 | 0.0 |
Gemini 2.0 Pro Experimental | 86.5 |
![]() |
2025-02-05 | 未知 |
Qwen2.5-72B | 86.1 |
![]() |
2024-09-18 | 727.0 |
Llama3.1-70B-Instruct | 86.0 |
![]() |
2024-07-23 | 700.0 |
Llama3.3-70B-Instruct | 86.0 |
![]() |
2024-12-06 | 700.0 |
Amazon Nova Pro | 85.9 |
![]() |
2024-12-03 | 未知 |
GPT-4o(2024-11-20) | 85.7 |
![]() |
2024-11-20 | 未知 |
Llama 4 Maverick | 85.5 |
![]() |
2025-04-05 | 4000.0 |
OpenAI o1-mini | 85.2 |
![]() |
2024-09-12 | 未知 |
GPT-4o mini | 82.0 |
![]() |
2024-07-18 | 0.0 |
Grok-1.5 | 81.3 |
|
2024-03-29 | 未知 |
Mistral-Small-3.1-24B-Instruct-2503 | 80.62 |
![]() |
2025-03-17 | 240.0 |
GPT-4.1 nano | 80.1 |
![]() |
2025-04-14 | 未知 |
Llama 4 Scout | 79.6 |
![]() |
2025-04-05 | 1090.0 |
Claude 3.5 Haiku | 77.6 |
![]() |
2024-10-22 | 0.0 |
Gemma 3 - 27B (IT) | 76.9 |
![]() |
2025-03-12 | 270.0 |
Qwen2.5-7B | 74.2 |
![]() |
2024-09-18 | 70.0 |
C4AI Aya Vision 32B | 72.14 |
![]() |
2025-03-04 | 320.0 |
Gemma 2 - 9B | 71.3 |
![]() |
2024-06-27 | 90.0 |
Moonlight-16B-A3B-Instruct | 70.0 |
![]() |
2025-02-23 | 160.0 |
Llama3.1-8B-Instruct | 68.1 |
![]() |
2024-07-23 | 80.0 |
Phi-4-mini-instruct (3.8B) | 67.3 |
![]() |
2025-02-27 | 38.0 |
Llama3.1-8B | 66.6 |
![]() |
2024-07-23 | 80.0 |
Qwen2.5-3B | 65.6 |
![]() |
2024-09-18 | 30.0 |
Mistral-7B-Instruct-v0.3 | 64.2 |
![]() |
2024-05-22 | 70.0 |
Llama-3.2-3B | 54.75 |
![]() |
2024-09-18 | 32.0 |
GPT-4.5 | 0.0 |
![]() |
2025-02-28 | 未知 |