MMLU - 一种针对大模型的语言理解能力的测评,是目前最著名的大模型语义理解测评之一,任务涵盖的知识很广泛,语言是英文,用以评测大模型基本的知识覆盖范围和理解能力。
C Eval - C-Eval 是一个全面的中文基础模型评估套件。它包含了13948个多项选择题,涵盖了52个不同的学科和四个难度级别。用以评测大模型中文理解能力。
AGI Eval - 微软发布的大模型基础能力评测基准,主要评测大模型在人类认知和解决问题的一般能力,涵盖全球20种面向普通人类考生的官方、公共和高标准录取和资格考试,包含中英文数据。
GSM8K - OpenAI发布的大模型数学推理能力评测基准,涵盖了8500个中学水平的高质量数学题数据集。数据集比之前的数学文字题数据集规模更大,语言更具多样性,题目也更具挑战性。
- 免费商用授权
- 收费商用授权
- 开源不可商用
- 不开源
模型名称 | 参数大小 | MMLU | CEval | AGIEval | GSM8K | MATH | BBH | MT Bench | 发布者 | 开源情况 | 模型地址 |
---|---|---|---|---|---|---|---|---|---|---|---|
Claude 3.5 Sonnet New |
0.0 |
90.5 |
/ |
/ |
92.5 |
78.3 |
/ |
/ |
Claude 3.5 Sonnet New模型地址 | ||
GPT-4o |
88.7 |
/ |
/ |
90.5 |
76.6 |
/ |
/ |
GPT-4o模型地址 | |||
Claude 3.5 Sonnet |
88.7 |
/ |
/ |
96.4 |
71.1 |
/ |
/ |
Claude 3.5 Sonnet模型地址 | |||
Llama3.1-405B Instruct |
4050.0 |
87.3 |
/ |
/ |
96.8 |
73.8 |
/ |
/ |
Llama3.1-405B Instruct模型地址 | ||
Claude3-Opus |
0.0 |
86.8 |
/ |
/ |
95.0 |
60.1 |
/ |
9.43 |
Claude3-Opus模型地址 | ||
GPT-4 |
1750.0 |
86.4 |
68.7 |
/ |
87.1 |
42.5 |
/ |
9.32 |
GPT-4模型地址 | ||
Llama3-400B-Instruct-InTraining |
4000.0 |
86.1 |
/ |
/ |
94.1 |
57.8 |
/ |
/ |
Llama3-400B-Instruct-InTraining模型地址 | ||
Qwen2.5-72B |
727.0 |
86.1 |
/ |
/ |
91.5 |
62.1 |
86.3 |
/ |
Qwen2.5-72B模型地址 | ||
Llama3.1-405B |
4050.0 |
85.2 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-405B模型地址 | ||
Llama3-400B-InTraining |
4000.0 |
84.8 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-400B-InTraining模型地址 | ||
Qwen2-72B |
727.0 |
84.2 |
91.0 |
/ |
89.5 |
51.1 |
82.4 |
/ |
Qwen2-72B模型地址 | ||
Gemini-ultra |
0.0 |
83.7 |
/ |
/ |
88.9 |
53.2 |
/ |
/ |
Gemini-ultra模型地址 | ||
Llama3.1-70B-Instruct |
700.0 |
83.6 |
/ |
/ |
95.1 |
68.0 |
/ |
/ |
Llama3.1-70B-Instruct模型地址 | ||
Qwen2.5-32B |
320.0 |
83.3 |
/ |
/ |
92.9 |
57.7 |
84.5 |
/ |
Qwen2.5-32B模型地址 | ||
Qwen2-72B-Instruct |
720.0 |
82.3 |
83.8 |
/ |
91.1 |
59.7 |
/ |
9.12 |
Qwen2-72B-Instruct模型地址 | ||
Llama3-70B-Instruct |
700.0 |
82.0 |
/ |
/ |
93.0 |
50.4 |
/ |
/ |
Llama3-70B-Instruct模型地址 | ||
GPT-4o mini |
0.0 |
82.0 |
/ |
/ |
87.0 |
70.2 |
/ |
/ |
GPT-4o mini模型地址 | ||
Gemini 1.5 Pro |
0.0 |
81.9 |
/ |
/ |
91.7 |
58.5 |
/ |
/ |
Gemini 1.5 Pro模型地址 | ||
GLM4 |
0.0 |
81.5 |
/ |
/ |
87.6 |
47.9 |
82.3 |
/ |
GLM4模型地址 | ||
Grok-1.5 |
81.3 |
/ |
/ |
90.0 |
50.6 |
/ |
/ |
Grok-1.5模型地址 | |||
Mistral Large |
0.0 |
81.2 |
/ |
/ |
81.0 |
45.0 |
/ |
8.66 |
Mistral Large模型地址 | ||
Claude 3.5 Haiku |
0.0 |
80.9 |
/ |
/ |
85.6 |
69.2 |
/ |
/ |
Claude 3.5 Haiku模型地址 | ||
Qwen2.5-Math-72B |
727.0 |
80.8 |
/ |
/ |
95.9 |
85.9 |
/ |
/ |
Qwen2.5-Math-72B模型地址 | ||
YAYI2-30B |
300.0 |
80.5 |
80.9 |
62.0 |
71.2 |
/ |
/ |
/ |
YAYI2-30B模型地址 | ||
Qwen1.5-110B |
1100.0 |
80.4 |
/ |
/ |
85.4 |
49.6 |
74.8 |
8.88 |
Qwen1.5-110B模型地址 | ||
DeepSeek V2.5 |
2360.0 |
80.4 |
/ |
/ |
95.1 |
74.7 |
/ |
/ |
DeepSeek V2.5模型地址 | ||
Qwen2.5-14B |
140.0 |
79.7 |
/ |
/ |
92.9 |
57.7 |
78.2 |
/ |
Qwen2.5-14B模型地址 | ||
Llama3-70B |
700.0 |
79.5 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-70B模型地址 | ||
Llama3.1-70B |
700.0 |
79.3 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-70B模型地址 | ||
Gemini-pro |
1000.0 |
79.13 |
/ |
/ |
86.5 |
/ |
/ |
/ |
Gemini-pro模型地址 | ||
Claude3-Sonnet |
0.0 |
79.0 |
/ |
/ |
92.3 |
43.1 |
/ |
9.18 |
Claude3-Sonnet模型地址 | ||
DeepSeek-V2-236B |
2360.0 |
78.5 |
81.7 |
/ |
79.2 |
43.6 |
78.9 |
/ |
DeepSeek-V2-236B模型地址 | ||
PaLM 2 |
3400.0 |
78.3 |
/ |
/ |
80.7 |
/ |
/ |
/ |
PaLM 2模型地址 | ||
Phi-3-medium 14B-preview |
140.0 |
78.2 |
/ |
48.4 |
90.3 |
/ |
/ |
8.91 |
Phi-3-medium 14B-preview模型地址 | ||
Mixtral-8×22B-MoE |
1410.0 |
77.75 |
/ |
/ |
78.6 |
41.8 |
/ |
/ |
Mixtral-8×22B-MoE模型地址 | ||
Qwen1.5-72B-Chat |
720.0 |
77.5 |
84.1 |
/ |
79.5 |
34.1 |
65.5 |
8.67 |
Qwen1.5-72B-Chat模型地址 | ||
Qwen-72B |
720.0 |
77.4 |
83.3 |
62.5 |
78.9 |
/ |
/ |
/ |
Qwen-72B模型地址 | ||
Yi-1.5-34B |
340.0 |
77.1 |
/ |
71.1 |
82.7 |
41.0 |
76.4 |
/ |
Yi-1.5-34B模型地址 | ||
Qwen2-57B-A14B |
570.0 |
76.5 |
87.7 |
/ |
80.7 |
43.0 |
67.0 |
/ |
Qwen2-57B-A14B模型地址 | ||
Yi-34B |
340.0 |
76.3 |
81.4 |
/ |
/ |
/ |
/ |
/ |
Yi-34B模型地址 | ||
Yi-34B-200K |
340.0 |
76.1 |
81.9 |
/ |
/ |
/ |
/ |
/ |
Yi-34B-200K模型地址 | ||
Phi-3-small 7B |
70.0 |
75.3 |
/ |
45.0 |
88.9 |
/ |
/ |
8.7 |
Phi-3-small 7B模型地址 | ||
Claude3-Haiku |
0.0 |
75.2 |
/ |
/ |
88.9 |
38.9 |
/ |
/ |
Claude3-Haiku模型地址 | ||
Gemma2-27B |
270.0 |
75.0 |
/ |
/ |
75.0 |
/ |
/ |
/ |
Gemma2-27B模型地址 | ||
GLM-4-9B |
90.0 |
74.7 |
/ |
/ |
84.0 |
30.4 |
/ |
/ |
GLM-4-9B模型地址 | ||
Qwen2.5-7B |
70.0 |
74.2 |
/ |
/ |
85.4 |
49.8 |
70.4 |
/ |
Qwen2.5-7B模型地址 | ||
DBRX Instruct |
1320.0 |
73.7 |
/ |
/ |
72.8 |
/ |
/ |
8.39 |
DBRX Instruct模型地址 | ||
Qwen1.5-32B |
320.0 |
73.4 |
83.5 |
/ |
77.4 |
36.1 |
/ |
8.3 |
Qwen1.5-32B模型地址 | ||
Grok-1 |
3140.0 |
73.0 |
/ |
/ |
62.9 |
/ |
/ |
/ |
Grok-1模型地址 | ||
GLM-4-9B-Chat |
90.0 |
72.4 |
75.6 |
/ |
79.6 |
50.6 |
/ |
8.35 |
GLM-4-9B-Chat模型地址 | ||
Apollo-7B |
70.0 |
71.86 |
/ |
/ |
/ |
/ |
/ |
/ |
Apollo-7B模型地址 | ||
Gemma 2 - 9B |
90.0 |
71.3 |
/ |
52.8 |
68.6 |
36.6 |
68.2 |
/ |
Gemma 2 - 9B模型地址 | ||
DeepSeek-V2-236B-Chat |
2360.0 |
71.1 |
65.2 |
/ |
84.4 |
32.6 |
71.7 |
/ |
DeepSeek-V2-236B-Chat模型地址 | ||
XVERSE-65B |
650.0 |
70.8 |
/ |
61.8 |
60.3 |
/ |
/ |
/ |
XVERSE-65B模型地址 | ||
Mixtral-8×7B-MoE |
450.0 |
70.6 |
/ |
/ |
74.4 |
28.4 |
/ |
8.3 |
Mixtral-8×7B-MoE模型地址 | ||
Qwen2-7B |
70.0 |
70.3 |
83.2 |
/ |
79.9 |
44.2 |
62.6 |
/ |
Qwen2-7B模型地址 | ||
GPT-3.5 |
1750.0 |
70.0 |
54.4 |
/ |
57.1 |
/ |
/ |
8.39 |
GPT-3.5模型地址 | ||
Yi-1.5-9B |
90.0 |
69.5 |
/ |
62.7 |
73.7 |
32.6 |
72.4 |
/ |
Yi-1.5-9B模型地址 | ||
Llama3.1-8B-Instruct |
80.0 |
69.4 |
/ |
/ |
84.5 |
519.0 |
/ |
/ |
Llama3.1-8B-Instruct模型地址 | ||
PaLM |
5400.0 |
69.3 |
/ |
/ |
56.5 |
/ |
/ |
/ |
PaLM模型地址 | ||
LLaMA2 70B |
700.0 |
68.9 |
/ |
54.2 |
56.8 |
/ |
/ |
/ |
LLaMA2 70B模型地址 | ||
Phi-3-mini 3.8B |
38.0 |
68.8 |
/ |
37.5 |
82.5 |
/ |
/ |
8.38 |
Phi-3-mini 3.8B模型地址 | ||
Yi-9B |
90.0 |
68.4 |
/ |
/ |
52.3 |
15.9 |
/ |
/ |
Yi-9B模型地址 | ||
Llama3-8B-Instruct |
80.0 |
68.4 |
/ |
/ |
79.6 |
30.0 |
/ |
/ |
Llama3-8B-Instruct模型地址 | ||
Mistral NeMo-Base-12B |
120.0 |
68.0 |
/ |
/ |
/ |
/ |
/ |
7.84 |
Mistral NeMo-Base-12B模型地址 | ||
Mistral NeMo-Instruct-12B |
120.0 |
68.0 |
/ |
/ |
/ |
/ |
/ |
7.84 |
Mistral NeMo-Instruct-12B模型地址 | ||
Aquila2-34B |
340.0 |
67.79 |
63.07 |
/ |
58.4 |
/ |
/ |
/ |
Aquila2-34B模型地址 | ||
Jamba-v0.1 |
520.0 |
67.4 |
/ |
/ |
59.9 |
/ |
45.4 |
/ |
Jamba-v0.1模型地址 | ||
Llama3.1-8B |
80.0 |
66.7 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-8B模型地址 | ||
Llama3-8B |
80.0 |
66.6 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-8B模型地址 | ||
Qwen-14B |
140.0 |
66.3 |
72.1 |
/ |
61.3 |
/ |
/ |
/ |
Qwen-14B模型地址 | ||
Grok-0 |
330.0 |
65.7 |
/ |
/ |
56.8 |
/ |
/ |
/ |
Grok-0模型地址 | ||
Qwen2.5-3B |
30.0 |
65.6 |
/ |
/ |
79.1 |
42.6 |
56.3 |
/ |
Qwen2.5-3B模型地址 | ||
Gemma 7B |
70.0 |
64.3 |
/ |
41.7 |
46.4 |
24.3 |
55.1 |
/ |
Gemma 7B模型地址 | ||
Yi-6B-200K |
60.0 |
64.0 |
73.5 |
/ |
/ |
/ |
/ |
/ |
Yi-6B-200K模型地址 | ||
Starling-7B-LM-Beta |
70.0 |
63.9 |
/ |
/ |
/ |
/ |
/ |
8.09 |
Starling-7B-LM-Beta模型地址 | ||
LLaMA 65B |
650.0 |
63.4 |
38.8 |
47.6 |
50.9 |
/ |
/ |
/ |
LLaMA 65B模型地址 | ||
Yi-6B |
60.0 |
63.2 |
72.0 |
/ |
/ |
/ |
/ |
/ |
Yi-6B模型地址 | ||
LLaMA2 34B |
340.0 |
62.6 |
/ |
43.4 |
42.2 |
/ |
/ |
/ |
LLaMA2 34B模型地址 | ||
Qwen1.5-MoE-A2.7B |
143.0 |
62.5 |
/ |
/ |
61.5 |
/ |
/ |
7.17 |
Qwen1.5-MoE-A2.7B模型地址 | ||
StableLM2-12B |
120.0 |
62.09 |
/ |
/ |
56.03 |
/ |
/ |
8.15 |
StableLM2-12B模型地址 | ||
ChatGLM3-6B-Base |
60.0 |
61.4 |
69.0 |
53.7 |
72.3 |
/ |
/ |
/ |
ChatGLM3-6B-Base模型地址 | ||
StableLM2-12B-Chat |
120.0 |
61.14 |
/ |
/ |
57.7 |
/ |
/ |
8.15 |
StableLM2-12B-Chat模型地址 | ||
Qwen2.5-1.5B |
15.0 |
60.9 |
/ |
/ |
68.5 |
35.0 |
45.1 |
/ |
Qwen2.5-1.5B模型地址 | ||
XVERSE-13B-Chat |
130.0 |
60.2 |
53.1 |
48.3 |
/ |
/ |
/ |
/ |
XVERSE-13B-Chat模型地址 | ||
XVERSE-MoE-A4.2B |
258.0 |
60.2 |
60.5 |
48.0 |
51.2 |
/ |
/ |
/ |
XVERSE-MoE-A4.2B模型地址 | ||
Mistral 7B |
73.0 |
60.1 |
/ |
43.0 |
52.1 |
/ |
/ |
/ |
Mistral 7B模型地址 | ||
DeciLM-7B |
70.4 |
59.76 |
/ |
/ |
47.38 |
/ |
/ |
/ |
DeciLM-7B模型地址 | ||
Baichuan2-13B-Base |
130.0 |
59.17 |
58.1 |
48.17 |
52.77 |
/ |
/ |
/ |
Baichuan2-13B-Base模型地址 | ||
MiniCPM-MoE-8x2B |
136.0 |
58.9 |
58.11 |
/ |
61.5 |
10.52 |
39.22 |
/ |
MiniCPM-MoE-8x2B模型地址 | ||
LLaMA 33B |
330.0 |
57.8 |
/ |
41.7 |
35.6 |
/ |
/ |
/ |
LLaMA 33B模型地址 | ||
Qwen-7B |
70.0 |
56.7 |
59.6 |
/ |
51.6 |
/ |
/ |
/ |
Qwen-7B模型地址 | ||
Phi-2 |
27.0 |
56.7 |
/ |
/ |
61.1 |
/ |
/ |
/ |
Phi-2模型地址 | ||
Qwen2-1.5B |
15.0 |
56.5 |
70.6 |
/ |
58.5 |
21.7 |
37.2 |
/ |
Qwen2-1.5B模型地址 | ||
ChatGLM2 12B |
120.0 |
56.18 |
61.6 |
/ |
40.94 |
/ |
/ |
/ |
ChatGLM2 12B模型地址 | ||
XVERSE-13B |
130.0 |
55.1 |
54.7 |
41.4 |
/ |
/ |
/ |
/ |
XVERSE-13B模型地址 | ||
LLaMA2 13B |
130.0 |
54.84 |
/ |
39.1 |
28.7 |
/ |
/ |
/ |
LLaMA2 13B模型地址 | ||
Baichuan2-7B-Base |
70.0 |
54.16 |
54.0 |
42.73 |
24.49 |
/ |
/ |
/ |
Baichuan2-7B-Base模型地址 | ||
GPT-3 |
1750.0 |
53.9 |
/ |
/ |
/ |
/ |
/ |
/ |
GPT-3模型地址 | ||
MiniCPM-2B-DPO |
24.0 |
53.46 |
51.13 |
/ |
53.83 |
10.24 |
36.87 |
7.25 |
MiniCPM-2B-DPO模型地址 | ||
Baichuan 13B - Chat |
130.0 |
52.1 |
51.5 |
/ |
26.6 |
/ |
/ |
/ |
Baichuan 13B - Chat模型地址 | ||
Baichuan 13B - Base |
130.0 |
51.62 |
52.4 |
/ |
26.6 |
/ |
/ |
/ |
Baichuan 13B - Base模型地址 | ||
InternLM 7B |
70.0 |
51.0 |
53.4 |
37.6 |
31.2 |
/ |
/ |
/ |
InternLM 7B模型地址 | ||
InternLM Chat 7B 8K |
70.0 |
50.8 |
53.2 |
42.5 |
31.2 |
/ |
/ |
/ |
InternLM Chat 7B 8K模型地址 | ||
ChatGLM2-6B |
62.0 |
47.86 |
51.7 |
/ |
32.37 |
/ |
/ |
/ |
ChatGLM2-6B模型地址 | ||
Qwen2.5-0.5B |
5.0 |
47.5 |
/ |
/ |
41.6 |
19.5 |
20.3 |
/ |
Qwen2.5-0.5B模型地址 | ||
LLaMA 13B |
130.0 |
46.94 |
/ |
33.9 |
17.8 |
/ |
/ |
/ |
LLaMA 13B模型地址 | ||
Stable LM Zephyr 3B |
30.0 |
45.9 |
30.34 |
/ |
52.54 |
12.2 |
37.86 |
6.64 |
Stable LM Zephyr 3B模型地址 | ||
Qwen2-0.5B |
4.0 |
45.4 |
58.2 |
/ |
58.5 |
10.7 |
28.4 |
/ |
Qwen2-0.5B模型地址 | ||
LLaMA2 7B |
70.0 |
45.3 |
/ |
29.3 |
14.6 |
/ |
/ |
/ |
LLaMA2 7B模型地址 | ||
Qwen-1.8B |
18.0 |
45.3 |
/ |
/ |
32.3 |
/ |
/ |
/ |
Qwen-1.8B模型地址 | ||
GLM-130B |
1300.0 |
44.8 |
44.0 |
/ |
/ |
/ |
/ |
/ |
GLM-130B模型地址 | ||
Ziya-LLaMA-13B-Pretrain-v1 |
130.0 |
43.9 |
30.2 |
27.2 |
/ |
/ |
/ |
/ |
Ziya-LLaMA-13B-Pretrain-v1模型地址 | ||
OpenLLaMA 13B |
130.0 |
42.4 |
24.7 |
24.0 |
/ |
/ |
/ |
/ |
OpenLLaMA 13B模型地址 | ||
Baichuan 7B |
70.0 |
42.3 |
42.8 |
34.44 |
9.7 |
/ |
/ |
/ |
Baichuan 7B模型地址 | ||
Gemma 2B |
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Gemma 2B模型地址 | ||
Gemma 2B - It |
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Gemma 2B - It模型地址 | ||
Stable LM 2 - 1.6B |
16.0 |
38.93 |
/ |
/ |
17.82 |
/ |
/ |
/ |
Stable LM 2 - 1.6B模型地址 | ||
RecurrentGemma-2B |
27.0 |
38.4 |
/ |
23.8 |
13.4 |
11.8 |
/ |
/ |
RecurrentGemma-2B模型地址 | ||
Phi-1.5 |
13.0 |
37.6 |
/ |
/ |
40.2 |
/ |
/ |
/ |
Phi-1.5模型地址 | ||
DeepSeek Coder-6.7B Instruct |
67.0 |
37.2 |
/ |
/ |
62.8 |
28.6 |
46.9 |
/ |
DeepSeek Coder-6.7B Instruct模型地址 | ||
ChatGLM-6B |
62.0 |
36.9 |
38.9 |
/ |
4.82 |
/ |
/ |
/ |
ChatGLM-6B模型地址 | ||
LLaMA 7B |
70.0 |
35.1 |
27.1 |
23.9 |
11.0 |
/ |
/ |
/ |
LLaMA 7B模型地址 | ||
MOSS |
160.0 |
27.4 |
33.13 |
26.8 |
/ |
/ |
/ |
/ |
MOSS模型地址 | ||
OPT |
1750.0 |
25.2 |
25.0 |
24.2 |
/ |
/ |
/ |
/ |
OPT模型地址 | ||
Pythia |
120.0 |
25.1 |
26.2 |
25.3 |
/ |
/ |
/ |
/ |
Pythia模型地址 | ||
TinyLlama |
11.0 |
24.3 |
25.02 |
/ |
2.27 |
/ |
/ |
/ |
TinyLlama模型地址 | ||
CodeGemma-7B |
70.0 |
/ |
/ |
/ |
44.2 |
19.9 |
/ |
/ |
CodeGemma-7B模型地址 | ||
CodeGemma-7B-IT |
70.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
CodeGemma-7B-IT模型地址 | ||
CodeGemma-2B |
20.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
CodeGemma-2B模型地址 | ||
WizardLM-2-70B |
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.92 |
WizardLM-2-70B模型地址 | ||
WizardLM-2-7B |
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.28 |
WizardLM-2-7B模型地址 | ||
WizardLM-2 8x22B |
1760.0 |
/ |
/ |
/ |
/ |
/ |
/ |
9.12 |
WizardLM-2 8x22B模型地址 | ||
DeepSeek-R1-Lite-Preview |
/ |
/ |
/ |
/ |
91.6 |
/ |
/ |
DeepSeek-R1-Lite-Preview模型地址 | |||
CPM-Bee |
100.0 |
/ |
54.1 |
/ |
/ |
/ |
/ |
/ |
CPM-Bee模型地址 | ||
Aquila-7B |
70.0 |
/ |
25.5 |
25.58 |
/ |
/ |
/ |
/ |
Aquila-7B模型地址 | ||
Phi-1 |
13.0 |
/ |
/ |
/ |
/ |
/ |
/ |
/ |
Phi-1模型地址 |
模型名称 | 参数大小 | MMLU | CEval | AGIEval | GSM8K | MATH | BBH | MT Bench | 发布者 | 开源情况 | 模型地址 |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 |
90.5 |
/ |
/ |
92.5 |
78.3 |
/ |
/ |
Claude 3.5 Sonnet New模型地址 | |||
88.7 |
/ |
/ |
90.5 |
76.6 |
/ |
/ |
GPT-4o模型地址 | ||||
88.7 |
/ |
/ |
96.4 |
71.1 |
/ |
/ |
Claude 3.5 Sonnet模型地址 | ||||
4050.0 |
87.3 |
/ |
/ |
96.8 |
73.8 |
/ |
/ |
Llama3.1-405B Instruct模型地址 | |||
0.0 |
86.8 |
/ |
/ |
95.0 |
60.1 |
/ |
9.43 |
Claude3-Opus模型地址 | |||
1750.0 |
86.4 |
68.7 |
/ |
87.1 |
42.5 |
/ |
9.32 |
GPT-4模型地址 | |||
4000.0 |
86.1 |
/ |
/ |
94.1 |
57.8 |
/ |
/ |
Llama3-400B-Instruct-InTraining模型地址 | |||
727.0 |
86.1 |
/ |
/ |
91.5 |
62.1 |
86.3 |
/ |
Qwen2.5-72B模型地址 | |||
4050.0 |
85.2 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-405B模型地址 | |||
4000.0 |
84.8 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-400B-InTraining模型地址 | |||
727.0 |
84.2 |
91.0 |
/ |
89.5 |
51.1 |
82.4 |
/ |
Qwen2-72B模型地址 | |||
0.0 |
83.7 |
/ |
/ |
88.9 |
53.2 |
/ |
/ |
Gemini-ultra模型地址 | |||
700.0 |
83.6 |
/ |
/ |
95.1 |
68.0 |
/ |
/ |
Llama3.1-70B-Instruct模型地址 | |||
320.0 |
83.3 |
/ |
/ |
92.9 |
57.7 |
84.5 |
/ |
Qwen2.5-32B模型地址 | |||
720.0 |
82.3 |
83.8 |
/ |
91.1 |
59.7 |
/ |
9.12 |
Qwen2-72B-Instruct模型地址 | |||
700.0 |
82.0 |
/ |
/ |
93.0 |
50.4 |
/ |
/ |
Llama3-70B-Instruct模型地址 | |||
0.0 |
82.0 |
/ |
/ |
87.0 |
70.2 |
/ |
/ |
GPT-4o mini模型地址 | |||
0.0 |
81.9 |
/ |
/ |
91.7 |
58.5 |
/ |
/ |
Gemini 1.5 Pro模型地址 | |||
0.0 |
81.5 |
/ |
/ |
87.6 |
47.9 |
82.3 |
/ |
GLM4模型地址 | |||
81.3 |
/ |
/ |
90.0 |
50.6 |
/ |
/ |
Grok-1.5模型地址 | ||||
0.0 |
81.2 |
/ |
/ |
81.0 |
45.0 |
/ |
8.66 |
Mistral Large模型地址 | |||
0.0 |
80.9 |
/ |
/ |
85.6 |
69.2 |
/ |
/ |
Claude 3.5 Haiku模型地址 | |||
727.0 |
80.8 |
/ |
/ |
95.9 |
85.9 |
/ |
/ |
Qwen2.5-Math-72B模型地址 | |||
300.0 |
80.5 |
80.9 |
62.0 |
71.2 |
/ |
/ |
/ |
YAYI2-30B模型地址 | |||
1100.0 |
80.4 |
/ |
/ |
85.4 |
49.6 |
74.8 |
8.88 |
Qwen1.5-110B模型地址 | |||
2360.0 |
80.4 |
/ |
/ |
95.1 |
74.7 |
/ |
/ |
DeepSeek V2.5模型地址 | |||
140.0 |
79.7 |
/ |
/ |
92.9 |
57.7 |
78.2 |
/ |
Qwen2.5-14B模型地址 | |||
700.0 |
79.5 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-70B模型地址 | |||
700.0 |
79.3 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-70B模型地址 | |||
1000.0 |
79.13 |
/ |
/ |
86.5 |
/ |
/ |
/ |
Gemini-pro模型地址 | |||
0.0 |
79.0 |
/ |
/ |
92.3 |
43.1 |
/ |
9.18 |
Claude3-Sonnet模型地址 | |||
2360.0 |
78.5 |
81.7 |
/ |
79.2 |
43.6 |
78.9 |
/ |
DeepSeek-V2-236B模型地址 | |||
3400.0 |
78.3 |
/ |
/ |
80.7 |
/ |
/ |
/ |
PaLM 2模型地址 | |||
140.0 |
78.2 |
/ |
48.4 |
90.3 |
/ |
/ |
8.91 |
Phi-3-medium 14B-preview模型地址 | |||
1410.0 |
77.75 |
/ |
/ |
78.6 |
41.8 |
/ |
/ |
Mixtral-8×22B-MoE模型地址 | |||
720.0 |
77.5 |
84.1 |
/ |
79.5 |
34.1 |
65.5 |
8.67 |
Qwen1.5-72B-Chat模型地址 | |||
720.0 |
77.4 |
83.3 |
62.5 |
78.9 |
/ |
/ |
/ |
Qwen-72B模型地址 | |||
340.0 |
77.1 |
/ |
71.1 |
82.7 |
41.0 |
76.4 |
/ |
Yi-1.5-34B模型地址 | |||
570.0 |
76.5 |
87.7 |
/ |
80.7 |
43.0 |
67.0 |
/ |
Qwen2-57B-A14B模型地址 | |||
340.0 |
76.3 |
81.4 |
/ |
/ |
/ |
/ |
/ |
Yi-34B模型地址 | |||
340.0 |
76.1 |
81.9 |
/ |
/ |
/ |
/ |
/ |
Yi-34B-200K模型地址 | |||
70.0 |
75.3 |
/ |
45.0 |
88.9 |
/ |
/ |
8.7 |
Phi-3-small 7B模型地址 | |||
0.0 |
75.2 |
/ |
/ |
88.9 |
38.9 |
/ |
/ |
Claude3-Haiku模型地址 | |||
270.0 |
75.0 |
/ |
/ |
75.0 |
/ |
/ |
/ |
Gemma2-27B模型地址 | |||
90.0 |
74.7 |
/ |
/ |
84.0 |
30.4 |
/ |
/ |
GLM-4-9B模型地址 | |||
70.0 |
74.2 |
/ |
/ |
85.4 |
49.8 |
70.4 |
/ |
Qwen2.5-7B模型地址 | |||
1320.0 |
73.7 |
/ |
/ |
72.8 |
/ |
/ |
8.39 |
DBRX Instruct模型地址 | |||
320.0 |
73.4 |
83.5 |
/ |
77.4 |
36.1 |
/ |
8.3 |
Qwen1.5-32B模型地址 | |||
3140.0 |
73.0 |
/ |
/ |
62.9 |
/ |
/ |
/ |
Grok-1模型地址 | |||
90.0 |
72.4 |
75.6 |
/ |
79.6 |
50.6 |
/ |
8.35 |
GLM-4-9B-Chat模型地址 | |||
70.0 |
71.86 |
/ |
/ |
/ |
/ |
/ |
/ |
Apollo-7B模型地址 | |||
90.0 |
71.3 |
/ |
52.8 |
68.6 |
36.6 |
68.2 |
/ |
Gemma 2 - 9B模型地址 | |||
2360.0 |
71.1 |
65.2 |
/ |
84.4 |
32.6 |
71.7 |
/ |
DeepSeek-V2-236B-Chat模型地址 | |||
650.0 |
70.8 |
/ |
61.8 |
60.3 |
/ |
/ |
/ |
XVERSE-65B模型地址 | |||
450.0 |
70.6 |
/ |
/ |
74.4 |
28.4 |
/ |
8.3 |
Mixtral-8×7B-MoE模型地址 | |||
70.0 |
70.3 |
83.2 |
/ |
79.9 |
44.2 |
62.6 |
/ |
Qwen2-7B模型地址 | |||
1750.0 |
70.0 |
54.4 |
/ |
57.1 |
/ |
/ |
8.39 |
GPT-3.5模型地址 | |||
90.0 |
69.5 |
/ |
62.7 |
73.7 |
32.6 |
72.4 |
/ |
Yi-1.5-9B模型地址 | |||
80.0 |
69.4 |
/ |
/ |
84.5 |
519.0 |
/ |
/ |
Llama3.1-8B-Instruct模型地址 | |||
5400.0 |
69.3 |
/ |
/ |
56.5 |
/ |
/ |
/ |
PaLM模型地址 | |||
700.0 |
68.9 |
/ |
54.2 |
56.8 |
/ |
/ |
/ |
LLaMA2 70B模型地址 | |||
38.0 |
68.8 |
/ |
37.5 |
82.5 |
/ |
/ |
8.38 |
Phi-3-mini 3.8B模型地址 | |||
90.0 |
68.4 |
/ |
/ |
52.3 |
15.9 |
/ |
/ |
Yi-9B模型地址 | |||
80.0 |
68.4 |
/ |
/ |
79.6 |
30.0 |
/ |
/ |
Llama3-8B-Instruct模型地址 | |||
120.0 |
68.0 |
/ |
/ |
/ |
/ |
/ |
7.84 |
Mistral NeMo-Base-12B模型地址 | |||
120.0 |
68.0 |
/ |
/ |
/ |
/ |
/ |
7.84 |
Mistral NeMo-Instruct-12B模型地址 | |||
340.0 |
67.79 |
63.07 |
/ |
58.4 |
/ |
/ |
/ |
Aquila2-34B模型地址 | |||
520.0 |
67.4 |
/ |
/ |
59.9 |
/ |
45.4 |
/ |
Jamba-v0.1模型地址 | |||
80.0 |
66.7 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3.1-8B模型地址 | |||
80.0 |
66.6 |
/ |
/ |
/ |
/ |
/ |
/ |
Llama3-8B模型地址 | |||
140.0 |
66.3 |
72.1 |
/ |
61.3 |
/ |
/ |
/ |
Qwen-14B模型地址 | |||
330.0 |
65.7 |
/ |
/ |
56.8 |
/ |
/ |
/ |
Grok-0模型地址 | |||
30.0 |
65.6 |
/ |
/ |
79.1 |
42.6 |
56.3 |
/ |
Qwen2.5-3B模型地址 | |||
70.0 |
64.3 |
/ |
41.7 |
46.4 |
24.3 |
55.1 |
/ |
Gemma 7B模型地址 | |||
60.0 |
64.0 |
73.5 |
/ |
/ |
/ |
/ |
/ |
Yi-6B-200K模型地址 | |||
70.0 |
63.9 |
/ |
/ |
/ |
/ |
/ |
8.09 |
Starling-7B-LM-Beta模型地址 | |||
650.0 |
63.4 |
38.8 |
47.6 |
50.9 |
/ |
/ |
/ |
LLaMA 65B模型地址 | |||
60.0 |
63.2 |
72.0 |
/ |
/ |
/ |
/ |
/ |
Yi-6B模型地址 | |||
340.0 |
62.6 |
/ |
43.4 |
42.2 |
/ |
/ |
/ |
LLaMA2 34B模型地址 | |||
143.0 |
62.5 |
/ |
/ |
61.5 |
/ |
/ |
7.17 |
Qwen1.5-MoE-A2.7B模型地址 | |||
120.0 |
62.09 |
/ |
/ |
56.03 |
/ |
/ |
8.15 |
StableLM2-12B模型地址 | |||
60.0 |
61.4 |
69.0 |
53.7 |
72.3 |
/ |
/ |
/ |
ChatGLM3-6B-Base模型地址 | |||
120.0 |
61.14 |
/ |
/ |
57.7 |
/ |
/ |
8.15 |
StableLM2-12B-Chat模型地址 | |||
15.0 |
60.9 |
/ |
/ |
68.5 |
35.0 |
45.1 |
/ |
Qwen2.5-1.5B模型地址 | |||
130.0 |
60.2 |
53.1 |
48.3 |
/ |
/ |
/ |
/ |
XVERSE-13B-Chat模型地址 | |||
258.0 |
60.2 |
60.5 |
48.0 |
51.2 |
/ |
/ |
/ |
XVERSE-MoE-A4.2B模型地址 | |||
73.0 |
60.1 |
/ |
43.0 |
52.1 |
/ |
/ |
/ |
Mistral 7B模型地址 | |||
70.4 |
59.76 |
/ |
/ |
47.38 |
/ |
/ |
/ |
DeciLM-7B模型地址 | |||
130.0 |
59.17 |
58.1 |
48.17 |
52.77 |
/ |
/ |
/ |
Baichuan2-13B-Base模型地址 | |||
136.0 |
58.9 |
58.11 |
/ |
61.5 |
10.52 |
39.22 |
/ |
MiniCPM-MoE-8x2B模型地址 | |||
330.0 |
57.8 |
/ |
41.7 |
35.6 |
/ |
/ |
/ |
LLaMA 33B模型地址 | |||
70.0 |
56.7 |
59.6 |
/ |
51.6 |
/ |
/ |
/ |
Qwen-7B模型地址 | |||
27.0 |
56.7 |
/ |
/ |
61.1 |
/ |
/ |
/ |
Phi-2模型地址 | |||
15.0 |
56.5 |
70.6 |
/ |
58.5 |
21.7 |
37.2 |
/ |
Qwen2-1.5B模型地址 | |||
120.0 |
56.18 |
61.6 |
/ |
40.94 |
/ |
/ |
/ |
ChatGLM2 12B模型地址 | |||
130.0 |
55.1 |
54.7 |
41.4 |
/ |
/ |
/ |
/ |
XVERSE-13B模型地址 | |||
130.0 |
54.84 |
/ |
39.1 |
28.7 |
/ |
/ |
/ |
LLaMA2 13B模型地址 | |||
70.0 |
54.16 |
54.0 |
42.73 |
24.49 |
/ |
/ |
/ |
Baichuan2-7B-Base模型地址 | |||
1750.0 |
53.9 |
/ |
/ |
/ |
/ |
/ |
/ |
GPT-3模型地址 | |||
24.0 |
53.46 |
51.13 |
/ |
53.83 |
10.24 |
36.87 |
7.25 |
MiniCPM-2B-DPO模型地址 | |||
130.0 |
52.1 |
51.5 |
/ |
26.6 |
/ |
/ |
/ |
Baichuan 13B - Chat模型地址 | |||
130.0 |
51.62 |
52.4 |
/ |
26.6 |
/ |
/ |
/ |
Baichuan 13B - Base模型地址 | |||
70.0 |
51.0 |
53.4 |
37.6 |
31.2 |
/ |
/ |
/ |
InternLM 7B模型地址 | |||
70.0 |
50.8 |
53.2 |
42.5 |
31.2 |
/ |
/ |
/ |
InternLM Chat 7B 8K模型地址 | |||
62.0 |
47.86 |
51.7 |
/ |
32.37 |
/ |
/ |
/ |
ChatGLM2-6B模型地址 | |||
5.0 |
47.5 |
/ |
/ |
41.6 |
19.5 |
20.3 |
/ |
Qwen2.5-0.5B模型地址 | |||
130.0 |
46.94 |
/ |
33.9 |
17.8 |
/ |
/ |
/ |
LLaMA 13B模型地址 | |||
30.0 |
45.9 |
30.34 |
/ |
52.54 |
12.2 |
37.86 |
6.64 |
Stable LM Zephyr 3B模型地址 | |||
4.0 |
45.4 |
58.2 |
/ |
58.5 |
10.7 |
28.4 |
/ |
Qwen2-0.5B模型地址 | |||
70.0 |
45.3 |
/ |
29.3 |
14.6 |
/ |
/ |
/ |
LLaMA2 7B模型地址 | |||
18.0 |
45.3 |
/ |
/ |
32.3 |
/ |
/ |
/ |
Qwen-1.8B模型地址 | |||
1300.0 |
44.8 |
44.0 |
/ |
/ |
/ |
/ |
/ |
GLM-130B模型地址 | |||
130.0 |
43.9 |
30.2 |
27.2 |
/ |
/ |
/ |
/ |
Ziya-LLaMA-13B-Pretrain-v1模型地址 | |||
130.0 |
42.4 |
24.7 |
24.0 |
/ |
/ |
/ |
/ |
OpenLLaMA 13B模型地址 | |||
70.0 |
42.3 |
42.8 |
34.44 |
9.7 |
/ |
/ |
/ |
Baichuan 7B模型地址 | |||
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Gemma 2B模型地址 | |||
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Gemma 2B - It模型地址 | |||
16.0 |
38.93 |
/ |
/ |
17.82 |
/ |
/ |
/ |
Stable LM 2 - 1.6B模型地址 | |||
27.0 |
38.4 |
/ |
23.8 |
13.4 |
11.8 |
/ |
/ |
RecurrentGemma-2B模型地址 | |||
13.0 |
37.6 |
/ |
/ |
40.2 |
/ |
/ |
/ |
Phi-1.5模型地址 | |||
67.0 |
37.2 |
/ |
/ |
62.8 |
28.6 |
46.9 |
/ |
DeepSeek Coder-6.7B Instruct模型地址 | |||
62.0 |
36.9 |
38.9 |
/ |
4.82 |
/ |
/ |
/ |
ChatGLM-6B模型地址 | |||
70.0 |
35.1 |
27.1 |
23.9 |
11.0 |
/ |
/ |
/ |
LLaMA 7B模型地址 | |||
160.0 |
27.4 |
33.13 |
26.8 |
/ |
/ |
/ |
/ |
MOSS模型地址 | |||
1750.0 |
25.2 |
25.0 |
24.2 |
/ |
/ |
/ |
/ |
OPT模型地址 | |||
120.0 |
25.1 |
26.2 |
25.3 |
/ |
/ |
/ |
/ |
Pythia模型地址 | |||
11.0 |
24.3 |
25.02 |
/ |
2.27 |
/ |
/ |
/ |
TinyLlama模型地址 | |||
70.0 |
/ |
/ |
/ |
44.2 |
19.9 |
/ |
/ |
CodeGemma-7B模型地址 | |||
70.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
CodeGemma-7B-IT模型地址 | |||
20.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
CodeGemma-2B模型地址 | |||
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.92 |
WizardLM-2-70B模型地址 | |||
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.28 |
WizardLM-2-7B模型地址 | |||
1760.0 |
/ |
/ |
/ |
/ |
/ |
/ |
9.12 |
WizardLM-2 8x22B模型地址 | |||
/ |
/ |
/ |
/ |
91.6 |
/ |
/ |
DeepSeek-R1-Lite-Preview模型地址 | ||||
100.0 |
/ |
54.1 |
/ |
/ |
/ |
/ |
/ |
CPM-Bee模型地址 | |||
70.0 |
/ |
25.5 |
25.58 |
/ |
/ |
/ |
/ |
Aquila-7B模型地址 | |||
13.0 |
/ |
/ |
/ |
/ |
/ |
/ |
/ |
Phi-1模型地址 |
数据说明:所有数据来源于论文或者GitHub上的评测结果,以官方论文为主,部分数据来源第三方评测!