Will an Open Source LLM Surpass any GPT-4 model in Elo Rating on Chatbot Arena on december 31, 2024?

Plus

Ṁ11k

Dec 31

96%

chance

ALL

This market is centered on whether an open source Large Language Model (LLM) will achieve a higher Elo rating than OpenAI’s GPT-4 on the Chatbot Arena platform by the end of 2024.

Chatbot Arena utilizes a crowdsourced, randomized battle platform where user votes contribute to computing Elo ratings. This specific market will be resolved based on the Elo ratings of the models as reported by Chatbot Arena.
The link to the website: https://chat.lmsys.org/

The resolution will consider the highest Elo rating recorded for an open source LLM compared to GPT-4’s Elo rating as of December 31, 2024.

It’s important to note that only the Elo ratings will be used for determining the outcome, not considering other benchmarks like MT-Bench or MMLU scores.

The latest update of the leaderboard by December 20, 2024, will be used for the final assessment.

Update 2024-12-12 (PST) (AI summary of creator comment): - The market will consider any GPT-4 model variant present on the Chatbot Arena leaderboard, not just a specific version
- An open source model needs to surpass any of the GPT-4 models listed on the leaderboard to resolve as YES

Update 2024-12-12 (PST) (AI summary of creator comment): - For a YES resolution, an open source model must surpass any of the GPT-4 model variants listed on the leaderboard
- Multiple GPT-4 model variants may be present on the leaderboard, and all will be considered for comparison

#️ Technology

#AI

#Technical AI Timelines

#OpenAI

Get

1,000

and

1.00

16 Comments

Sort by:

why is this market so high? I don't understand

@Bayesian see comments

@FedorShabashev I read them, I still don't get it

command R is way lower in Elo than gpt4o currently is. does gpt4o not count as gpt-4? idk

The resolution will consider the highest Elo rating recorded for an open source LLM compared to GPT-4’s Elo rating as of December 31, 2024.

@Bayesian there are multiple GPT4 models (OpenAI is constantly releasing new versions) on the LLM arena.
market is predicting whether any open source model is going to be ranked higher than any of the GPT4 models.
note that market was created long time before models like LLama and Qwen were published

@FedorShabashev yeah, so I still don't get it. If, on december 20th or 31st or whatever, the current best OpenAI model in the gpt4 family is better than all the opensource models, this resolves NO?

@Bayesian An open source model needs to surpass any of the GPT-4 models listed on the leaderboard to resolve as YES

@FedorShabashev oh ok. why isn't this resolved YES already then? It says "by december" and this already happened

@Bayesian because: " The latest update of the leaderboard by December 20, 2024, will be used for the final assessment."

bought Ṁ1,000 YES

@FedorShabashev so the market title could be changed to "Will an Open Source LLM Surpass any GPT-4 model in Elo Rating on Chatbot Arena on december 20, 2024"?

@Bayesian yes.

Let me update

technically I can only resolve on December 31th

https://huggingface.co/CohereForAI/c4ai-command-r-plus

Which GPT-4?

predicts YES

@Jacy Any of them

Related questions

Related questions