Which of these models have an ELO Rating in the LMARENA (formerly known as LMSYS) by the end of January 2025?

Premium

Ṁ73k

resolved Feb 17

ALL

Resolved

YES

OpenAI's o1

Resolved

YES

DeepSeek's r1

Resolved

Openai's o1 Pro

Resolved

Gemini 2 (flagship)

If on January 31st 2025 or earlier a model has a score in the LMARENA leaderboard, the respective market resolves to YES.

Gemini 2.0 (flagship) resolves to YES if Google DeepMind implies that the model is their best Gemini 2.0 version, whatever that is called.

#️ Technology

#AI

#Technical AI Timelines

#LLMs

#LMSYS

Get

1,000

and

1.00

17 Comments

Sort by:

Gemini 2 (flagship)

@MP @mods I believe this resolves No?

@MP

This is the snapshot of LM arena as of February 2. Last update was January 27.

filled a Ṁ50 Gemini 2 (flagship) YES at 24% order

https://x.com/GeminiApp/status/1885071572228333670

Google implies this is their flagship model now, with 1.5 Pro being deprecated and 2.0 Pro not out yet. Only needs confirmation that this snapshot is present on LM Arena.

Looks like the version that is currently live on the app is not the same snapshot as the five-week-old December experimental version after all (see Logan Kilpatrick's response):

https://x.com/legit_rumors/status/1885114851854369194

The API docs have been updated with a new endpoint for the released 2.0 Flash: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-2.0-flash

The latest Arena leaderboard update confirms it's a new snapshot (see image).

Ergo, a 2.0 model recognized as a flagship released on Jan 30 was not present on the arena until February, so it should resolve to NO.

@MP R1 resolves to YES.

bought Ṁ5,000 DeepSeek's r1 YES

https://x.com/lmarena_ai/status/1882749951924715578

Have you ever been duped off your funds all in the name of investment and investing in companies and getting a certain percentage in return or your bitcoin account was hacked and your funds was stolen, any which ways i am here with a way you can get your stolen funds back which is you contacting (dorisashley71 (@) gmail. Com) also Whatsapp +1---(404)--721--56--08 and following all their instructions because this is something i did and i got my stolen funds back from scammers in the form of a company, they also offer other cyber technology services you just present it before them and you will get the solution you desire that i can assure you of.

OpenAI's o1

bought Ṁ9,000 OpenAI's o1 YES

@MP resolves Yes

opened a Ṁ103 OpenAI's o1 NO at 90% order

o1 has a rating now!

bought Ṁ500 Gemini 2 (flagship) YES

Another question: You say the Gemini 2.0 resolves according to the "best" model. Does this include the "Thinking Mode"? So if Gemini 2.0 has an Elo, but a "Gemini 2.0 Thinking Mode" (analogous to "Gemini 2.0 Flash Thinking Mode") has been announced but does not have an Elo yet, will the Gemini 2 question resolve Yes or No?

bought Ṁ350 Gemini 2 (flagship) YES

For r1, would "r1-lite" count? Would "r1-preview" count?

For Gemini, would "Gemini-2.0-Exp" count, or does it have to be "Gemini-2.0" without the "-Exp" marker?

@MP Friendly ping! Would be lovely to get a clarification on the resolution criteria.

Comment hidden

Related questions

Related questions