상위 모델 기준: NormScore - LiveBench
순위 | 모델명 | NormScore - LiveBench | 에이전틱 코딩 | 코딩 | 데이터 분석 | 만약 | 언어 | 수학 | 추론 |
---|---|---|---|---|---|---|---|---|---|
1 | o3 High | 74.606 | 74.212 | 75.215 | 71.340 | 77.797 | 72.344 | 73.286 | 76.098 |
2 | Gemini 2.5 Pro Preview | 72.099 | 63.610 | 72.176 | 76.395 | 75.370 | 70.018 | 76.488 | 71.004 |
3 | o3 Pro High | 71.398 | 47.708 | 75.308 | 73.638 | 77.531 | 76.424 | 73.066 | 76.118 |
4 | Claude 4 Opus Thinking | 71.004 | 56.543 | 71.881 | 76.887 | 72.886 | 71.476 | 76.215 | 72.769 |
5 | o3 Medium | 70.134 | 54.776 | 76.383 | 72.455 | 76.095 | 69.395 | 69.555 | 73.148 |
6 | Claude 4 Sonnet Thinking | 69.464 | 49.475 | 72.176 | 75.326 | 72.575 | 68.370 | 73.558 | 76.587 |
7 | o4-Mini High | 69.010 | 51.241 | 78.439 | 73.623 | 76.694 | 61.296 | 73.076 | 70.822 |
8 | Gemini 2.5 Pro Preview (2025-06-05 Max Thinking) | 68.159 | 38.873 | 72.473 | 78.231 | 69.822 | 73.546 | 72.626 | 75.783 |
9 | DeepSeek R1 | 66.833 | 42.407 | 70.027 | 77.365 | 72.168 | 61.920 | 73.511 | 73.228 |
10 | Claude 3.7 Sonnet Thinking | 65.915 | 47.708 | 71.788 | 74.983 | 73.344 | 66.674 | 68.175 | 61.236 |
11 | Gemini 2.5 Pro Preview | 65.148 | 21.204 | 69.343 | 78.398 | 70.927 | 72.688 | 72.003 | 75.364 |
12 | Claude 4 Opus | 64.222 | 51.241 | 72.176 | 70.449 | 70.748 | 73.822 | 68.283 | 45.384 |
13 | o4-Mini Medium | 62.924 | 33.572 | 72.770 | 72.941 | 73.806 | 57.597 | 69.695 | 63.080 |
14 | Gemini 2.5 Flash Preview | 61.488 | 30.038 | 62.302 | 75.343 | 71.769 | 56.730 | 72.565 | 63.129 |
15 | DeepSeek R1 | 60.872 | 28.272 | 74.622 | 74.024 | 72.611 | 52.462 | 67.116 | 62.053 |
16 | Qwen 3 235B A22B | 60.636 | 17.669 | 65.137 | 73.589 | 79.174 | 56.900 | 68.972 | 62.651 |
17 | Grok 3 Mini Beta (High) | 59.968 | 30.038 | 53.410 | 68.008 | 71.036 | 57.486 | 66.121 | 70.480 |
18 | Gemini 2.5 Flash Preview | 59.562 | 28.272 | 59.172 | 68.737 | 71.324 | 58.875 | 70.477 | 59.089 |
19 | Claude 4 Sonnet | 59.363 | 30.038 | 76.771 | 68.193 | 69.742 | 65.827 | 66.154 | 44.103 |
20 | Qwen 3 32B | 59.361 | 14.136 | 62.989 | 72.622 | 76.882 | 52.972 | 68.893 | 66.824 |
21 | Claude 3.7 Sonnet | 56.709 | 37.106 | 72.863 | 63.681 | 69.028 | 62.372 | 56.038 | 39.475 |
22 | Qwen 3 30B A3B | 55.504 | 15.903 | 46.556 | 70.600 | 75.155 | 52.743 | 65.763 | 57.309 |
23 | GPT-4.5 Preview | 55.002 | 22.971 | 74.622 | 62.933 | 65.221 | 62.643 | 58.643 | 43.757 |
24 | Grok 3 Beta | 53.064 | 17.669 | 72.176 | 59.841 | 76.486 | 52.787 | 54.264 | 39.049 |
25 | DeepSeek V3.1 | 52.351 | 19.436 | 67.582 | 64.235 | 73.606 | 46.387 | 61.560 | 35.601 |
26 | GPT-4.1 | 52.028 | 17.669 | 71.788 | 69.331 | 69.483 | 53.189 | 53.574 | 35.696 |
27 | ChatGPT-4o | 50.566 | 17.669 | 75.993 | 69.533 | 64.863 | 48.902 | 47.802 | 39.231 |
28 | Claude 3.5 Sonnet | 49.146 | 22.971 | 72.473 | 59.882 | 62.559 | 54.399 | 43.917 | 34.704 |
29 | Qwen2.5 Max | 48.672 | 7.068 | 65.526 | 68.404 | 68.061 | 57.133 | 49.277 | 30.968 |
30 | GPT-4.1 Mini | 47.199 | 10.601 | 70.713 | 62.350 | 63.438 | 37.024 | 50.530 | 43.211 |
31 | Mistral Medium 3 | 47.171 | 19.436 | 60.339 | 59.436 | 64.390 | 44.139 | 51.463 | 33.728 |
32 | Llama 4 Maverick 17B 128E Instruct | 45.597 | 7.068 | 53.113 | 52.429 | 68.346 | 47.825 | 52.332 | 35.264 |
33 | Phi-4 Reasoning Plus | 44.845 | 5.301 | 59.376 | 55.498 | 66.021 | 29.180 | 53.413 | 46.463 |
34 | DeepSeek R1 Distill Llama 70B | 44.704 | 7.068 | 45.685 | 63.292 | 63.101 | 35.854 | 50.216 | 48.102 |
35 | GPT-4o | 43.640 | 12.369 | 67.971 | 65.245 | 58.611 | 43.913 | 35.745 | 31.948 |
36 | Gemini 2.0 Flash Lite | 43.284 | 5.301 | 58.190 | 68.485 | 69.215 | 33.207 | 47.403 | 25.916 |
37 | Hunyuan Turbos | 41.238 | 3.534 | 49.390 | 49.293 | 68.788 | 33.586 | 49.580 | 30.714 |
38 | Gemma 3 27B | 40.320 | 7.068 | 48.019 | 39.772 | 67.678 | 40.388 | 44.919 | 27.660 |
39 | Mistral Large | 39.900 | 1.767 | 61.709 | 55.473 | 61.289 | 40.427 | 36.749 | 27.214 |
40 | Qwen2.5 72B Instruct Turbo | 39.351 | 3.534 | 56.226 | 53.086 | 58.173 | 36.143 | 44.810 | 27.411 |
41 | Mistral Small | 38.256 | 12.369 | 48.705 | 54.680 | 57.470 | 34.122 | 33.206 | 29.797 |
42 | DeepSeek R1 Distill Qwen 32B | 38.190 | 5.301 | 46.074 | 52.127 | 50.289 | 29.513 | 51.548 | 35.720 |
43 | Claude 3.5 Haiku | 36.398 | 7.068 | 52.132 | 55.340 | 55.829 | 38.534 | 29.957 | 21.007 |
44 | GPT-4.1 Nano | 36.339 | 7.068 | 62.691 | 45.035 | 51.920 | 29.066 | 36.513 | 28.571 |
45 | Command R Plus | 28.190 | 1.767 | 26.603 | 47.743 | 51.919 | 30.321 | 19.676 | 17.382 |
46 | Command R | 25.704 | 1.767 | 25.622 | 38.670 | 50.157 | 27.378 | 15.957 | 16.524 |