模型详细情况和参数
DeepSeekMoE是幻方量化旗下大模型企业DeepSeek开源的一个混合专家大模型,也是目前已知的中国第一个开源的MoE大模型。
该模型参数164亿,但是单次推理只会使用28亿参数,因此可以理解为推理成本与30亿参数规模的大模型差不多。但是其效果和70亿参数规模的大模型等同。
DeepSeekMoE 16B Chat是其聊天优化的版本。
评测结果如下:
指标 | 抽样次数 | LLAMA2-7B SFT | DeepSeek 7B Chat | DeepSeekMoE 16B Chat |
---|---|---|---|---|
参数总数 | N/A | 6.7B | 6.9B | 16.4B |
激活参数数 | N/A | 6.7B | 6.9B | 2.8B |
每 4K 令牌的 FLOPs | N/A | 187.9T | 183.5T | 74.4T |
HellaSwag (Acc.) | 0-shot | 67.9 | 71.0 | 72.2 |
PIQA (Acc.) | 0-shot | 76.9 | 78.4 | 79.7 |
ARC-easy (Acc.) | 0-shot | 69.7 | 70.2 | 69.9 |
ARC-challenge (Acc.) | 0-shot | 50.8 | 50.2 | 50.0 |
BBH (EM) | 3-shot | 39.3 | 43.1 | 42.2 |
RACE-middle (Acc.) | 5-shot | 63.9 | 66.1 | 64.8 |
RACE-high (Acc.) | 5-shot | 49.6 | 50.8 | 50.6 |
DROP (EM) | 1-shot | 40.0 | 41.7 | 33.8 |
GSM8K (EM) | 0-shot | 63.4 | 62.6 | 62.2 |
MATH (EM) | 4-shot | 13.5 | 14.7 | 15.2 |
HumanEval (Pass@1) | 0-shot | 35.4 | 45.1 | 45.7 |
MBPP (Pass@1) | 3-shot | 27.8 | 39.0 | 46.2 |
TriviaQA (EM) | 5-shot | 60.1 | 59.5 | 63.3 |
NaturalQuestions (EM) | 0-shot | 35.2 | 32.7 | 35.1 |
MMLLU (Acc.) | 0-shot | 50.0 | 49.7 | 47.2 |
WinoGrande (Acc.) | 0-shot | 65.1 | 68.4 | 69.0 |
CLUE-WSC (EM) | 5-shot | 48.4 | 66.2 | 68.2 |
CEval (Acc.) | 0-shot | 35.1 | 44.7 | 40.0 |
CMMLU (Acc.) | 0-shot | 36.9 | 51.2 | 49.3 |
详细介绍参考: https://www.datalearner.com/blog/1051704952803167
该模型免费商用授权。