Top Model Benchmark: NormScore - Livebench
Rank | Model Name | NormScore - Livebench | Agentic Coding | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
---|---|---|---|---|---|---|---|---|---|
1 | Claude 4 Opus | 65.926 | 61.854 | 64.279 | 68.221 | 63.611 | 69.585 | 67.974 | 66.945 |
2 | Claude 4 Sonnet | 59.511 | 30.926 | 68.342 | 65.945 | 62.716 | 61.693 | 65.923 | 65.050 |
3 | Claude 3.7 Sonnet | 57.830 | 45.838 | 64.900 | 61.713 | 62.087 | 58.365 | 55.784 | 58.352 |
4 | GPT-4.5 Preview | 55.833 | 27.061 | 66.462 | 60.747 | 58.694 | 58.954 | 58.757 | 64.380 |
5 | Grok 3 Beta | 53.041 | 19.329 | 64.279 | 58.205 | 68.786 | 49.599 | 54.187 | 57.118 |
6 | DeepSeek V3.1 | 52.191 | 20.986 | 60.216 | 61.200 | 66.141 | 43.521 | 61.935 | 52.367 |
7 | GPT-4.1 | 52.055 | 19.329 | 63.960 | 66.856 | 62.524 | 49.943 | 54.166 | 53.017 |
8 | ChatGPT-4o | 50.828 | 19.329 | 67.704 | 67.074 | 58.360 | 45.549 | 48.402 | 57.907 |
9 | Claude 3.5 Sonnet | 49.348 | 27.061 | 64.581 | 58.090 | 56.260 | 50.788 | 43.491 | 50.849 |
10 | Qwen2.5 Max | 48.310 | 9.388 | 58.351 | 66.332 | 61.182 | 53.474 | 48.996 | 45.510 |
11 | GPT-4.1 Mini | 48.048 | 12.702 | 63.020 | 59.642 | 57.066 | 34.557 | 50.924 | 64.275 |
12 | Mistral Medium 3 | 47.190 | 20.986 | 53.683 | 56.340 | 57.942 | 41.271 | 51.866 | 49.873 |
13 | Llama 4 Maverick 17B 128E Instruct | 45.985 | 9.388 | 47.403 | 51.480 | 61.473 | 44.940 | 52.278 | 52.078 |
14 | GPT-4o | 43.537 | 14.359 | 60.534 | 62.607 | 52.721 | 40.975 | 35.926 | 46.600 |
15 | Gemini 2.0 Flash Lite | 42.450 | 4.970 | 51.803 | 66.102 | 62.212 | 30.878 | 47.523 | 38.076 |
16 | Hunyuan Turbos | 40.985 | 3.314 | 43.995 | 47.302 | 61.802 | 30.953 | 49.634 | 46.068 |
17 | Gemma 3 27B | 40.269 | 9.388 | 42.752 | 38.143 | 60.831 | 37.435 | 45.258 | 41.343 |
18 | Mistral Large | 39.234 | 1.657 | 54.926 | 53.176 | 55.130 | 37.634 | 36.244 | 40.448 |
19 | Qwen2.5 72B Instruct Turbo | 38.957 | 3.314 | 49.955 | 51.395 | 52.285 | 33.752 | 44.800 | 40.568 |
20 | Mistral Small | 38.475 | 14.359 | 43.373 | 52.796 | 51.680 | 31.664 | 33.235 | 44.220 |
21 | GPT-4.1 Nano | 36.307 | 9.388 | 55.228 | 41.440 | 46.686 | 27.019 | 36.438 | 42.509 |
22 | Claude 3.5 Haiku | 35.509 | 6.627 | 46.479 | 53.032 | 50.222 | 35.685 | 30.253 | 30.858 |
23 | Command R Plus | 27.517 | 1.657 | 23.710 | 44.996 | 46.747 | 28.105 | 19.835 | 25.715 |
24 | Command R | 24.988 | 1.657 | 22.786 | 36.477 | 45.116 | 25.156 | 15.836 | 24.272 |