상위 모델 기준: NormScore - LiveBench
순위 | 모델명 | NormScore - LiveBench | 에이전틱 코딩 | 코딩 | 데이터 분석 | 만약 | 언어 | 수학 | 추론 |
---|---|---|---|---|---|---|---|---|---|
1 | o3 High | 74.416 | 75.198 | 74.690 | 70.842 | 77.254 | 72.556 | 72.775 | 75.551 |
2 | Gemini 2.5 Pro Preview | 71.968 | 64.295 | 70.992 | 75.861 | 74.844 | 70.194 | 75.954 | 71.649 |
3 | Claude 4 Opus Thinking | 71.033 | 57.902 | 71.379 | 76.350 | 72.377 | 71.600 | 75.683 | 73.378 |
4 | o3 Medium | 70.102 | 55.646 | 75.849 | 71.949 | 75.564 | 69.613 | 69.069 | 73.733 |
5 | Claude 4 Sonnet Thinking | 69.497 | 50.759 | 71.673 | 74.800 | 72.068 | 68.579 | 73.044 | 77.194 |
6 | o4-Mini High | 68.986 | 52.262 | 77.891 | 73.109 | 76.158 | 61.483 | 72.566 | 71.387 |
7 | Gemini 2.5 Pro Preview (2025-06-05 Max Thinking) | 68.074 | 39.479 | 71.967 | 77.684 | 69.335 | 73.707 | 72.119 | 76.307 |
8 | Claude 3.7 Sonnet Thinking | 65.873 | 48.503 | 71.286 | 74.460 | 72.832 | 66.876 | 67.699 | 61.800 |
9 | Gemini 2.5 Pro Preview | 65.074 | 21.807 | 68.858 | 77.851 | 70.432 | 72.865 | 71.500 | 75.940 |
10 | DeepSeek R1 | 64.970 | 31.207 | 69.538 | 76.825 | 71.665 | 62.045 | 72.998 | 73.755 |
11 | Claude 4 Opus | 64.257 | 52.639 | 71.673 | 69.958 | 70.254 | 74.036 | 67.806 | 45.810 |
12 | o4-Mini Medium | 62.919 | 34.592 | 72.262 | 72.432 | 73.291 | 57.839 | 69.208 | 63.635 |
13 | Gemini 2.5 Flash Preview | 61.360 | 30.832 | 61.187 | 74.817 | 71.268 | 56.912 | 72.059 | 63.568 |
14 | DeepSeek R1 | 60.850 | 29.327 | 74.101 | 73.507 | 72.104 | 52.611 | 66.648 | 62.539 |
15 | Qwen 3 235B A22B | 60.569 | 18.423 | 64.682 | 73.075 | 78.621 | 57.033 | 68.491 | 63.181 |
16 | Grok 3 Mini Beta (High) | 59.890 | 30.455 | 53.037 | 67.533 | 70.540 | 57.643 | 65.659 | 71.078 |
17 | Gemini 2.5 Flash Preview | 59.555 | 29.327 | 58.759 | 68.258 | 70.826 | 59.024 | 69.985 | 59.586 |
18 | Claude 4 Sonnet | 59.423 | 31.583 | 76.236 | 67.717 | 69.255 | 66.067 | 65.692 | 44.504 |
19 | Qwen 3 32B | 59.283 | 14.663 | 62.549 | 72.116 | 76.345 | 53.180 | 68.412 | 67.395 |
20 | Claude 3.7 Sonnet | 56.673 | 37.975 | 72.354 | 63.237 | 68.547 | 62.597 | 55.647 | 39.808 |
21 | Qwen 3 30B A3B | 55.454 | 16.543 | 46.231 | 70.108 | 74.630 | 52.942 | 65.304 | 57.838 |
22 | GPT-4.5 Preview | 54.959 | 23.687 | 74.101 | 62.494 | 64.766 | 62.856 | 58.234 | 44.198 |
23 | Grok 3 Beta | 53.018 | 18.423 | 71.673 | 59.423 | 75.952 | 52.945 | 53.885 | 39.534 |
24 | DeepSeek V3.1 | 52.294 | 20.303 | 67.110 | 63.786 | 73.092 | 46.518 | 61.130 | 35.955 |
25 | GPT-4.1 | 51.961 | 18.423 | 71.286 | 68.847 | 68.998 | 53.374 | 53.200 | 35.988 |
26 | ChatGPT-4o | 50.521 | 18.423 | 75.463 | 69.048 | 64.410 | 49.122 | 47.469 | 39.575 |
27 | Claude 3.5 Sonnet | 49.100 | 23.687 | 71.967 | 59.464 | 62.123 | 54.588 | 43.610 | 34.998 |
28 | Qwen2.5 Max | 48.522 | 7.144 | 65.069 | 67.927 | 67.586 | 57.361 | 48.933 | 31.267 |
29 | Mistral Medium 3 | 47.138 | 20.303 | 59.917 | 59.021 | 63.941 | 44.307 | 51.104 | 33.995 |
30 | GPT-4.1 Mini | 47.079 | 10.904 | 70.220 | 61.915 | 62.995 | 37.203 | 50.177 | 43.494 |
31 | Llama 4 Maverick 17B 128E Instruct | 45.467 | 7.144 | 52.742 | 52.063 | 67.869 | 48.015 | 51.967 | 35.620 |
32 | Phi-4 Reasoning Plus | 44.739 | 5.640 | 58.961 | 55.110 | 65.560 | 29.314 | 53.040 | 46.802 |
33 | DeepSeek R1 Distill Llama 70B | 44.629 | 7.520 | 45.366 | 62.851 | 62.660 | 36.017 | 49.866 | 48.434 |
34 | GPT-4o | 43.572 | 12.783 | 67.496 | 64.790 | 58.202 | 44.108 | 35.496 | 32.300 |
35 | Gemini 2.0 Flash Lite | 43.164 | 5.640 | 57.784 | 68.007 | 68.731 | 33.389 | 47.072 | 26.159 |
36 | Hunyuan Turbos | 41.116 | 3.760 | 49.045 | 48.948 | 68.308 | 33.843 | 49.234 | 30.865 |
37 | Gemma 3 27B | 40.190 | 7.144 | 47.684 | 39.495 | 67.205 | 40.639 | 44.605 | 27.823 |
38 | Mistral Large | 39.770 | 1.880 | 61.279 | 55.086 | 60.862 | 40.592 | 36.492 | 27.444 |
39 | Qwen2.5 72B Instruct Turbo | 39.238 | 3.760 | 55.834 | 52.716 | 57.767 | 36.282 | 44.498 | 27.661 |
40 | Mistral Small | 38.191 | 12.783 | 48.365 | 54.298 | 57.068 | 34.309 | 32.974 | 30.006 |
41 | DeepSeek R1 Distill Qwen 32B | 38.140 | 5.640 | 45.752 | 51.763 | 49.938 | 29.706 | 51.188 | 36.138 |
42 | Claude 3.5 Haiku | 36.339 | 7.520 | 51.768 | 54.953 | 55.439 | 38.798 | 29.747 | 21.131 |
43 | GPT-4.1 Nano | 36.153 | 7.144 | 61.573 | 44.721 | 51.557 | 29.293 | 36.258 | 28.721 |
44 | Command R Plus | 28.108 | 1.880 | 26.418 | 47.410 | 51.556 | 30.501 | 19.539 | 17.507 |
45 | Command R | 25.641 | 1.880 | 25.443 | 38.400 | 49.807 | 27.597 | 15.845 | 16.648 |