LẬP TRÌNH

So sánh khả năng thực hiện các tác vụ lập trình web của nhiều công cụ AI khác nhau trong không gian code

Rank	Rank Spread (Upper-Lower)	Model	Score	95% CI (±)	Votes	Organization	License
1	1◄─►1	claude-opus-4-5-20251101-thinking-32k	1504	+10/-10	7.543	Anthropic	Proprietary
2	2◄─►5	gpt-5.2-high	1475	+16/-16	1.691	OpenAI	Proprietary
3	2◄─►5	claude-opus-4-5-20251101	1467	+9/-9	7.900	Anthropic	Proprietary
4	2◄─►6	gemini-3-pro	1462	+8/-8	14.043	Google	Proprietary
5	2◄─►6	gemini-3-flash	1454	+9/-9	8.389	Google	Proprietary
6	4◄─►6	glm-4.7	1445	+10/-10	5.650	Z.ai	MIT
7	7◄─►10	minimax-m2.1-preview	1414	+9/-9	7.201	MiniMax	MIT
8	7◄─►10	gemini-3-flash (thinking-minimal)	1412	+10/-10	5.430	Google	Proprietary
9	7◄─►15	gpt-5.2	1399	+15/-15	1.632	OpenAI	Proprietary
10	7◄─►15	gpt-5-medium	1397	+12/-12	3.929	OpenAI	Proprietary
11	9◄─►15	gpt-5.1-medium	1392	+9/-9	6.594	OpenAI	Proprietary
12	9◄─►15	claude-opus-4-1-20250805	1392	+8/-8	9.124	Anthropic	Proprietary
13	9◄─►15	claude-sonnet-4-5-20250929-thinking-32k	1390	+8/-8	11.001	Anthropic	Proprietary
14	9◄─►15	claude-sonnet-4-5-20250929	1386	+8/-8	12.662	Anthropic	Proprietary
15	9◄─►16	deepseek-v3.2-thinking	1377	+11/-11	3.552	DeepSeek	MIT
16	15◄─►19	glm-4.6	1358	+8/-8	8.890	Z.ai	MIT
17	14◄─►19	mimo-v2-flash	1337	+18/-18	1.039	Xiaomi	MIT
17	16◄─►19	gpt-5.1	1355	+8/-8	9.917	OpenAI	Proprietary
18	16◄─►20	mimo-v2-flash (non-thinking)	1351	+10/-10	3.943	Xiaomi	MIT
19	16◄─►21	gpt-5.2-codex	1344	+13/-13	2.500	OpenAI	Proprietary
20	18◄─►21	gpt-5.1-codex	1334	+9/-9	6.661	OpenAI	Proprietary
21	19◄─►21	kimi-k2-thinking-turbo	1333	+8/-8	9.556	Moonshot	Modified MIT
22	22◄─►23	minimax-m2	1316	+8/-8	8.997	MiniMax	Apache 2.0
23	22◄─►26	deepseek-v3.2	1299	+10/-10	4.581	DeepSeek	MIT
24	23◄─►26	claude-haiku-4-5-20251001	1298	+8/-8	10.767	Anthropic	Proprietary
25	23◄─►26	deepseek-v3.2-exp	1289	+10/-10	5.133	DeepSeek	MIT
26	23◄─►26	qwen3-coder-480b-a35b-instruct	1287	+8/-8	10.516	Alibaba	Apache 2.0
27	27◄─►29	KAT-Coder-Pro-V1	1262	+15/-15	1.956	KwaiKAT	Proprietary
28	27◄─►30	gpt-5.1-codex-mini	1247	+17/-17	1.538	OpenAI	Proprietary
29	27◄─►30	grok-4-1-fast-reasoning	1240	+11/-11	5.127	xAI	Proprietary
30	28◄─►32	mistral-large-3	1225	+20/-20	1.037	Mistral	Apache 2.0
31	30◄─►32	gemini-2.5-pro	1209	+13/-13	3.454	Google	Proprietary
32	30◄─►32	grok-4.1-thinking	1208	+19/-19	1.266	xAI	Proprietary
33	33◄─►34	grok-4-fast-reasoning	1156	+22/-22	970	xAI	Proprietary
34	33◄─►35	grok-code-fast-1	1143	+21/-21	1.017	xAI	Proprietary
35	34◄─►35	devstral-medium-2507	1101	+22/-22	1.020	Mistral	Proprietary

Battle Count for Each Combination of Models (without Ties)

Confidence Intervals on Model Strength (Elo)

Elo scores computed from battle counts. Error bars = 95% CI via Bootstrapping.

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles