Top Model Benchmark: NormScore - Livebench
| Rank | Model Name | NormScore - Livebench | Agentic Coding | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
|---|---|---|---|---|---|---|---|---|---|
| 1 | gpt-5 | 75.837 | 70.963 | 67.891 | 78.433 | 77.992 | 76.211 | 78.433 | 78.433 |
| 2 | Claude 4 Opus | 63.987 | 50.421 | 72.452 | 70.052 | 68.721 | 74.555 | 68.948 | 46.025 |
| 3 | Qwen 3 235B A22B Instruct 2507 | 60.115 | 21.476 | 65.412 | 67.828 | 66.381 | 62.454 | 68.861 | 70.643 |
| 4 | Claude 4 Sonnet | 59.221 | 29.879 | 77.032 | 67.801 | 67.716 | 66.653 | 66.713 | 44.719 |
| 5 | DeepSeek V3.1 | 58.863 | 28.946 | 67.512 | 67.198 | 73.991 | 58.449 | 68.515 | 48.046 |
| 6 | Kimi K2 Instruct | 58.453 | 24.277 | 70.692 | 65.559 | 72.329 | 62.145 | 64.667 | 51.327 |
| 7 | Qwen 3 Coder 480B A35B Instruct | 57.743 | 41.084 | 72.093 | 66.850 | 65.000 | 61.704 | 58.255 | 44.614 |
| 8 | Claude 3.7 Sonnet | 57.062 | 40.150 | 73.152 | 63.327 | 67.074 | 63.244 | 56.615 | 39.983 |
| 9 | GPT-5 Chat | 56.065 | 17.741 | 75.613 | 66.497 | 64.016 | 61.045 | 63.446 | 51.439 |
| 10 | GPT-4.5 Preview | 54.922 | 23.343 | 74.912 | 62.561 | 63.405 | 63.255 | 58.930 | 44.430 |
| 11 | Grok 3 Beta | 52.748 | 16.807 | 72.452 | 59.526 | 74.288 | 53.439 | 54.589 | 39.781 |
| 12 | GPT-4.1 | 52.076 | 19.608 | 72.093 | 68.918 | 67.538 | 53.796 | 53.616 | 36.096 |
| 13 | DeepSeek V3 0324 | 52.035 | 18.675 | 67.872 | 63.787 | 71.469 | 47.044 | 61.969 | 36.148 |
| 14 | ChatGPT-4o | 50.672 | 19.608 | 76.313 | 69.120 | 63.046 | 49.641 | 47.784 | 39.761 |
| 15 | Claude 3.5 Sonnet | 49.116 | 23.343 | 72.793 | 59.553 | 60.729 | 55.280 | 44.373 | 35.245 |
| 16 | Qwen2.5 Max | 48.269 | 5.602 | 65.771 | 68.027 | 66.085 | 57.852 | 49.577 | 31.446 |
| 17 | Mistral Medium 3 | 46.889 | 18.675 | 60.509 | 58.997 | 62.563 | 44.758 | 51.926 | 34.146 |
| 18 | GPT-4.1 Mini | 46.757 | 9.337 | 71.033 | 61.936 | 61.633 | 37.479 | 50.445 | 43.628 |
| 19 | Llama 4 Maverick 17B 128E Instruct | 45.576 | 8.403 | 53.430 | 52.195 | 66.418 | 48.271 | 52.427 | 35.768 |
| 20 | GPT-4o | 43.314 | 11.205 | 68.231 | 64.829 | 56.915 | 44.516 | 35.902 | 32.544 |
| 21 | Gemini 2.0 Flash Lite | 43.045 | 5.602 | 58.390 | 68.081 | 67.194 | 33.658 | 47.484 | 26.314 |
| 22 | Command A | 41.887 | 7.470 | 53.450 | 48.854 | 72.705 | 37.640 | 39.425 | 29.570 |
| 23 | Hunyuan Turbos | 40.963 | 3.735 | 49.589 | 48.978 | 66.801 | 34.069 | 49.604 | 30.898 |
| 24 | Gemma 3 27B | 39.775 | 5.602 | 48.188 | 39.517 | 65.642 | 40.951 | 44.747 | 27.874 |
| 25 | Mistral Large | 39.765 | 1.867 | 61.910 | 55.114 | 59.565 | 41.106 | 37.171 | 27.518 |
| 26 | Qwen2.5 72B Instruct Turbo | 39.174 | 3.735 | 56.307 | 52.786 | 56.468 | 36.685 | 44.957 | 27.766 |
| 27 | Mistral Small | 37.922 | 11.205 | 48.888 | 54.359 | 55.793 | 34.642 | 33.630 | 30.115 |
| 28 | Claude 3.5 Haiku | 36.259 | 7.470 | 52.389 | 54.980 | 54.206 | 39.031 | 30.089 | 21.281 |
| 29 | GPT-4.1 Nano | 35.791 | 5.602 | 62.951 | 44.596 | 50.469 | 29.263 | 35.936 | 28.819 |
| 30 | Gemma 3 12B | 34.817 | 1.867 | 41.507 | 31.821 | 64.687 | 31.012 | 41.041 | 23.061 |
| 31 | Command R Plus | 28.038 | 1.867 | 26.725 | 47.369 | 50.488 | 30.769 | 19.982 | 17.587 |
| 32 | Gemma 3n E4B IT | 26.893 | 1.867 | 30.965 | 16.394 | 56.741 | 25.164 | 27.430 | 17.597 |
| 33 | Command R | 25.565 | 1.867 | 25.684 | 38.369 | 48.774 | 27.808 | 16.265 | 16.758 |
| 34 | Gemma 3 4B | 23.128 | 0.000 | 15.463 | 18.080 | 55.733 | 14.992 | 26.631 | 15.888 |
| 35 | Gemma 3n E2B IT | 21.258 | 1.867 | 16.182 | 13.220 | 50.274 | 15.200 | 22.323 | 15.694 |