EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

Plus

Ṁ7041

Jan 1

57%

chance

ALL

On December 31 2025, will the LMSys code arena's best closed-source LLM out-perform the best open-weights LLM by less than 50 points?

As of July 27, 2024 the gap is 58 ELO points.

If LMSys ceases to exist or to evaluate models, I will resolve to 50%.

If a model is open-weights but the LMSys eval uses an API e.g. deepseekv2-API this still qualifies as open-weights (unless I get evidence that the API version was different enough to affect this question; in such a case I would resolve to 50%).

Chart from https://x.com/maximelabonne/status/1779801605702836454 This shows all-question ELO whereas this market resolves by coding-only ELO, the trend is similar.

Update 2025-05-28 (PST) (AI summary of creator comment): The creator has indicated that the market title has been updated to provide further clarity on the resolution criteria. This action was taken in response to a user's question about how the market resolves, particularly in scenarios involving the ELO difference between open-source and closed-source models. Please refer to the updated market title for the most precise definition of the resolution condition.

#Technical AI Timelines

#LLMs

#Chatbot Arena Leaderboard

#Programming

#️ AI Alignment

Get

1,000

and

1.00

8 Comments

Sort by:

bought Ṁ1,500 YES

It would be pretty surprising to me if this did not resolve positively. Currently the gap is 30pts with the default settings and 15pts without style control. ig gemini 3 could do very well.

bought Ṁ5 YES

What if open source models beat closed source models by more than 50 points? For example, o5 is at 1000 elo, and DeepSeek R2 is at 1100 elo. What will it resolve to?

@JamesJohnson updated title to make clear

https://x.com/amebagpt/status/1836875571906666836

The LMSYS main arena gap over time (1st vs 2nd, not necessarily OS)

x.com

If no one objects, I'll update question to read: "We'll go along with any LMsys evaluation updates: e.g. if there's a code-hard / code-style control etc. we'll use whatever the fanciest LM sys eval ends up being as long as it's code-only."

For clarification: if open source LLM overtakes closed-sourced one, will market resolve as "Yes"?

Yes

bought Ṁ10 YES

Thanks for clarification. I would buy "yes". I expect that in even worst case open source will advance with similar speed to closed source. I think Arena will eventually saturate, and shrink gap between top tiers artificially

Related questions

Related questions