上位モデル基準: NormScore - LiveBench
順位 | モデル名 | NormScore - LiveBench | エージェント型コーディング | コーディング | データ分析 | もし | 言語 | 数学 | 推論 |
---|---|---|---|---|---|---|---|---|---|
1 | GPT-5 High | 78.593 | 76.671 | 76.290 | 77.117 | 78.557 | 79.713 | 79.914 | 80.642 |
2 | GPT-5 Medium | 75.495 | 61.284 | 74.262 | 78.351 | 79.332 | 77.618 | 77.537 | 79.340 |
3 | gpt-5 | 75.060 | 61.284 | 69.917 | 78.351 | 79.332 | 77.618 | 77.537 | 79.340 |
4 | GPT-5 Low | 74.905 | 70.673 | 73.458 | 73.929 | 79.263 | 76.993 | 73.441 | 74.320 |
5 | o3 High | 73.209 | 61.024 | 77.707 | 71.085 | 76.842 | 74.199 | 73.421 | 77.767 |
6 | o3 Pro High | 71.294 | 43.030 | 77.803 | 73.401 | 76.603 | 78.408 | 73.152 | 77.767 |
7 | Claude 4.1 Opus Thinking | 70.859 | 49.028 | 74.970 | 77.129 | 71.706 | 71.076 | 78.721 | 76.556 |
8 | Gemini 2.5 Pro Preview | 70.805 | 51.635 | 74.568 | 75.786 | 74.476 | 71.889 | 76.476 | 72.495 |
9 | Claude 4 Opus Thinking | 70.413 | 49.028 | 74.262 | 76.440 | 72.037 | 73.326 | 76.230 | 74.320 |
10 | o3 Medium | 69.353 | 45.638 | 78.913 | 72.211 | 75.221 | 71.152 | 69.765 | 74.754 |
11 | Claude 4 Sonnet Thinking | 69.096 | 43.290 | 74.568 | 74.950 | 71.751 | 70.219 | 73.541 | 78.245 |
12 | GPT-5 Mini High | 68.777 | 33.902 | 67.295 | 77.646 | 76.556 | 72.705 | 78.087 | 75.119 |
13 | o4-Mini High | 68.356 | 43.551 | 81.038 | 73.263 | 75.737 | 62.782 | 73.140 | 72.381 |
14 | Grok 4 | 68.200 | 31.816 | 72.252 | 74.437 | 69.640 | 75.149 | 76.732 | 80.322 |
15 | GPT-5 Mini | 67.837 | 43.551 | 73.860 | 77.486 | 75.146 | 65.850 | 73.869 | 67.886 |
16 | Gemini 2.5 Pro (Max Thinking) | 67.784 | 32.337 | 74.874 | 77.722 | 69.032 | 75.495 | 72.844 | 77.447 |
17 | DeepSeek V3.1 Thinking | 67.188 | 29.469 | 71.237 | 73.228 | 76.612 | 69.200 | 76.313 | 74.481 |
18 | Qwen 3 235B A22B Thinking 2507 | 66.850 | 23.992 | 68.099 | 82.147 | 80.231 | 69.339 | 69.989 | 75.211 |
19 | DeepSeek R1 | 66.605 | 37.553 | 72.347 | 76.957 | 71.352 | 63.482 | 73.502 | 74.822 |
20 | Gemini 2.5 Pro | 65.352 | 18.776 | 71.640 | 77.882 | 70.078 | 74.628 | 72.164 | 76.990 |
21 | Claude 3.7 Sonnet Thinking | 65.234 | 39.900 | 74.166 | 74.561 | 72.542 | 68.485 | 68.067 | 62.569 |
22 | Claude 4 Opus | 63.774 | 45.116 | 74.568 | 70.235 | 69.917 | 75.792 | 68.129 | 46.367 |
23 | Sonoma Sky Alpha | 62.932 | 20.341 | 69.017 | 73.300 | 64.563 | 71.154 | 67.408 | 79.683 |
24 | o4-Mini Medium | 62.845 | 29.990 | 75.181 | 72.675 | 72.946 | 59.019 | 69.661 | 64.463 |
25 | Gemini 2.5 Flash | 61.367 | 26.339 | 64.367 | 74.966 | 70.921 | 58.335 | 72.482 | 64.509 |
26 | DeepSeek R1 | 60.991 | 26.079 | 77.095 | 73.769 | 71.803 | 53.823 | 67.001 | 63.391 |
27 | GLM 4.5 | 60.900 | 31.816 | 61.132 | 70.697 | 72.774 | 61.454 | 70.627 | 57.184 |
28 | Qwen 3 235B A22B Thinking | 60.883 | 16.690 | 67.295 | 73.231 | 78.256 | 58.291 | 68.852 | 64.029 |
29 | GPT-5 Mini Low | 60.834 | 37.814 | 67.200 | 71.025 | 70.422 | 55.828 | 64.624 | 61.246 |
30 | Qwen 3 235B A22B Instruct 2507 | 60.358 | 18.776 | 67.295 | 68.122 | 67.547 | 63.797 | 68.093 | 71.377 |
31 | Claude 4 Sonnet | 59.735 | 29.469 | 79.315 | 68.021 | 68.895 | 67.654 | 65.958 | 45.067 |
32 | Gemini 2.5 Flash Preview | 59.668 | 26.079 | 61.132 | 68.602 | 70.436 | 60.501 | 70.445 | 60.355 |
33 | Grok 3 Mini Beta (High) | 59.625 | 24.775 | 55.179 | 67.845 | 70.208 | 59.023 | 66.102 | 71.970 |
34 | Qwen 3 32B | 59.618 | 13.039 | 65.076 | 72.370 | 76.011 | 54.392 | 68.791 | 68.251 |
35 | Kimi K2 Instruct | 59.027 | 23.992 | 72.750 | 65.890 | 73.621 | 63.226 | 63.976 | 51.730 |
36 | DeepSeek V3.1 | 59.013 | 25.818 | 69.420 | 67.597 | 75.303 | 59.434 | 67.818 | 48.604 |
37 | Qwen 3 Coder 480B A35B Instruct | 57.533 | 35.728 | 74.166 | 67.192 | 66.113 | 62.883 | 57.714 | 44.839 |
38 | Claude 3.7 Sonnet | 56.399 | 32.077 | 75.277 | 63.470 | 68.262 | 64.127 | 55.931 | 40.343 |
39 | GPT-5 Chat | 56.257 | 14.865 | 77.803 | 66.857 | 65.145 | 62.137 | 62.881 | 51.867 |
40 | GLM 4.5 Air | 55.790 | 16.429 | 58.510 | 69.323 | 70.340 | 44.043 | 68.126 | 64.326 |
41 | GPT-5 Nano | 55.712 | 35.989 | 66.396 | 68.939 | 66.600 | 42.859 | 60.902 | 52.643 |
42 | Qwen 3 30B A3B | 55.649 | 14.865 | 48.098 | 70.380 | 74.233 | 54.079 | 65.510 | 58.576 |
43 | GPT-4.5 Preview | 55.092 | 20.602 | 77.095 | 62.817 | 64.510 | 64.326 | 58.362 | 44.702 |
44 | GPT-5 Nano High | 55.069 | 30.251 | 61.343 | 65.545 | 64.548 | 41.365 | 61.986 | 62.866 |
45 | Grok Code Fast | 54.361 | 26.079 | 65.190 | 72.760 | 63.985 | 44.310 | 59.789 | 54.948 |
46 | Gemini 2.5 Flash Lite Preview (Thinking) | 53.696 | 7.302 | 60.022 | 68.554 | 75.086 | 51.909 | 60.832 | 52.095 |
47 | GPT-5 Minimal | 53.615 | 26.339 | 71.735 | 66.775 | 68.506 | 51.617 | 50.729 | 45.067 |
48 | Grok 3 Beta | 53.251 | 16.690 | 74.568 | 59.559 | 75.581 | 54.233 | 54.028 | 39.865 |
49 | DeepSeek V3 0324 | 52.564 | 18.516 | 69.821 | 64.429 | 72.762 | 47.678 | 61.346 | 36.373 |
50 | GPT-4.1 | 52.233 | 16.690 | 74.166 | 69.231 | 68.705 | 54.644 | 53.215 | 36.464 |
51 | ChatGPT-4o | 50.802 | 16.690 | 78.511 | 69.424 | 64.133 | 50.314 | 47.452 | 40.093 |
52 | GPT OSS 120b | 50.161 | 13.039 | 59.525 | 58.980 | 59.131 | 40.577 | 59.191 | 63.755 |
53 | Claude 3.5 Sonnet | 49.189 | 20.602 | 74.874 | 59.660 | 61.763 | 55.960 | 43.821 | 35.506 |
54 | GPT-5 Mini Minimal | 48.842 | 24.514 | 68.405 | 63.584 | 62.973 | 39.704 | 44.236 | 45.204 |
55 | Qwen2.5 Max | 48.817 | 5.737 | 67.697 | 68.161 | 67.281 | 58.729 | 49.054 | 31.650 |
56 | Mistral Medium 3 | 47.391 | 18.516 | 62.338 | 59.727 | 63.617 | 45.386 | 51.364 | 34.479 |
57 | GPT-4.1 Mini | 47.352 | 9.389 | 73.056 | 62.447 | 62.701 | 38.070 | 50.106 | 44.177 |
58 | Llama 4 Maverick 17B 128E Instruct | 45.630 | 5.737 | 54.873 | 51.995 | 67.586 | 49.117 | 51.975 | 36.008 |
59 | Phi-4 Reasoning Plus | 45.145 | 5.477 | 61.343 | 55.600 | 65.240 | 29.958 | 53.078 | 47.508 |
60 | DeepSeek R1 Distill Llama 70B | 45.033 | 7.302 | 47.199 | 63.222 | 62.377 | 36.846 | 50.158 | 49.129 |
61 | GPT-4o | 43.817 | 11.214 | 70.223 | 65.270 | 57.887 | 45.163 | 35.571 | 32.654 |
62 | Gemini 2.0 Flash Lite | 43.507 | 5.477 | 60.118 | 68.361 | 68.378 | 34.167 | 47.086 | 26.493 |
63 | Sonoma Dusk Alpha | 43.483 | 13.039 | 57.286 | 60.052 | 59.359 | 46.912 | 37.697 | 34.867 |
64 | Command A | 42.406 | 7.562 | 54.969 | 49.266 | 73.999 | 38.055 | 39.084 | 29.847 |
65 | GPT-5 Nano Low | 41.679 | 7.302 | 51.543 | 56.605 | 60.019 | 32.014 | 47.899 | 38.518 |
66 | Hunyuan Turbos | 41.468 | 3.651 | 51.026 | 49.310 | 68.030 | 34.599 | 49.208 | 31.398 |
67 | Gemma 3 27B | 40.302 | 5.737 | 49.610 | 39.796 | 66.760 | 41.573 | 44.482 | 28.273 |
68 | Mistral Large | 40.236 | 1.825 | 63.754 | 55.515 | 60.617 | 41.605 | 36.684 | 27.793 |
69 | Qwen2.5 72B Instruct Turbo | 39.602 | 3.651 | 58.089 | 52.930 | 57.473 | 37.174 | 44.546 | 27.998 |
70 | DeepSeek R1 Distill Qwen 32B | 38.417 | 5.477 | 47.600 | 51.707 | 49.662 | 30.346 | 51.428 | 36.441 |
71 | Mistral Small | 38.338 | 11.214 | 50.319 | 54.574 | 56.740 | 35.124 | 33.202 | 30.463 |
72 | Claude 3.5 Haiku | 36.689 | 7.302 | 53.859 | 55.388 | 55.095 | 39.665 | 29.820 | 21.518 |
73 | GPT-4.1 Nano | 36.433 | 5.737 | 64.768 | 45.741 | 51.369 | 29.883 | 35.872 | 29.231 |
74 | Gemma 3 12B | 35.315 | 1.825 | 42.739 | 32.406 | 65.777 | 31.480 | 40.828 | 23.503 |
75 | GPT-5 Nano Minimal | 28.826 | 1.825 | 49.725 | 46.189 | 50.552 | 16.473 | 27.238 | 15.288 |
76 | Command R Plus | 28.405 | 1.825 | 27.485 | 48.078 | 51.347 | 31.214 | 19.716 | 17.776 |
77 | Gemma 3n E4B IT | 27.315 | 1.825 | 31.925 | 16.626 | 57.703 | 25.609 | 27.336 | 18.026 |
78 | Command R | 25.910 | 1.825 | 26.470 | 38.928 | 49.628 | 28.218 | 16.010 | 16.908 |
79 | Gemma 3 4B | 23.454 | 0.000 | 15.867 | 18.228 | 56.681 | 15.212 | 26.598 | 16.247 |
80 | Gemma 3n E2B IT | 21.596 | 1.825 | 16.670 | 13.682 | 51.079 | 15.396 | 22.255 | 16.156 |
81 | GPT-5 Pro | 19.504 | 0.000 | 73.152 | 0.000 | 0.000 | 0.000 | 0.000 | 81.258 |