EOY 2025: Will open LLMs match closed-source LLMs on coding to within 50 ELO points?
➕
Plus
6
Ṁ843
2026
37%
chance

On December 31 2025, will the LMSys code arena's best closed-source LLM out-perform the best open-weights LLM by less than 50 points?

As of July 27, 2024 the gap is 58 ELO points.

If LMSys ceases to exist or to evaluate models, I will resolve to 50%.

If a model is open-weights but the LMSys eval uses an API e.g. deepseekv2-API this still qualifies as open-weights (unless I get evidence that the API version was different enough to affect this question; in such a case I would resolve to 50%).

Chart from https://x.com/maximelabonne/status/1779801605702836454 This shows all-question ELO whereas this market resolves by coding-only ELO, the trend is similar.

Get
Ṁ1,000
and
S1.00
Sort by:

https://x.com/amebagpt/status/1836875571906666836

The LMSYS main arena gap over time (1st vs 2nd, not necessarily OS)

If no one objects, I'll update question to read: "We'll go along with any LMsys evaluation updates: e.g. if there's a code-hard / code-style control etc. we'll use whatever the fanciest LM sys eval ends up being as long as it's code-only."

For clarification: if open source LLM overtakes closed-sourced one, will market resolve as "Yes"?

Yes

bought Ṁ10 YES

Thanks for clarification. I would buy "yes". I expect that in even worst case open source will advance with similar speed to closed source. I think Arena will eventually saturate, and shrink gap between top tiers artificially