Which of these models have an ELO Rating in the LMARENA (formerly known as LMSYS) by the end of January 2025?
💎
Premium
16
Ṁ73k
resolved Feb 17
Resolved
YES
OpenAI's o1
Resolved
YES
DeepSeek's r1
Resolved
NO
Openai's o1 Pro
Resolved
NO
Gemini 2 (flagship)

If on January 31st 2025 or earlier a model has a score in the LMARENA leaderboard, the respective market resolves to YES.

Gemini 2.0 (flagship) resolves to YES if Google DeepMind implies that the model is their best Gemini 2.0 version, whatever that is called.

Get
Ṁ1,000
and
S1.00
Sort by:
Gemini 2 (flagship)

@MP @mods I believe this resolves No?

@MP

This is the snapshot of LM arena as of February 2. Last update was January 27.

filled a Ṁ50 Gemini 2 (flagship) YES at 24% order

https://x.com/GeminiApp/status/1885071572228333670

Google implies this is their flagship model now, with 1.5 Pro being deprecated and 2.0 Pro not out yet. Only needs confirmation that this snapshot is present on LM Arena.

Looks like the version that is currently live on the app is not the same snapshot as the five-week-old December experimental version after all (see Logan Kilpatrick's response):

https://x.com/legit_rumors/status/1885114851854369194

The API docs have been updated with a new endpoint for the released 2.0 Flash: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-2.0-flash

The latest Arena leaderboard update confirms it's a new snapshot (see image).

Ergo, a 2.0 model recognized as a flagship released on Jan 30 was not present on the arena until February, so it should resolve to NO.

@MP R1 resolves to YES.

bought Ṁ5,000 DeepSeek's r1 YES

I am incredibly grateful to Gavin Ray and his Team for helping me recover $100,000 in such a short period from an online scam bitcoin investment platform. Their professionalism, expertise, and relentless dedication made the entire process smooth and stress-free. I highly recommend their services to anyone seeking reliable assistance in financial recovery. Email: (Gavinray78( @)—-g m a i l——c o m”

I am incredibly grateful to Gavin Ray and his Team for helping me recover $100,000 in such a short period from an online scam bitcoin investment platform. Their professionalism, expertise, and relentless dedication made the entire process smooth and stress-free. I highly recommend their services to anyone seeking reliable assistance in financial recovery. Email: (Gavin( @)—-g m a i l——c o m”

Have you ever been duped off your funds all in the name of investment and investing in companies and getting a certain percentage in return or your bitcoin account was hacked and your funds was stolen, any which ways i am here with a way you can get your stolen funds back which is you contacting (dorisashley71 (@) gmail. Com) also Whatsapp +1---(404)--721--56--08 and following all their instructions because this is something i did and i got my stolen funds back from scammers in the form of a company, they also offer other cyber technology services you just present it before them and you will get the solution you desire that i can assure you of.

OpenAI's o1
bought Ṁ9,000 OpenAI's o1 YES

@MP resolves Yes

opened a Ṁ103 OpenAI's o1 NO at 90% order

o1 has a rating now!

bought Ṁ500 Gemini 2 (flagship) YES

Another question: You say the Gemini 2.0 resolves according to the "best" model. Does this include the "Thinking Mode"? So if Gemini 2.0 has an Elo, but a "Gemini 2.0 Thinking Mode" (analogous to "Gemini 2.0 Flash Thinking Mode") has been announced but does not have an Elo yet, will the Gemini 2 question resolve Yes or No?

bought Ṁ350 Gemini 2 (flagship) YES

For r1, would "r1-lite" count? Would "r1-preview" count?

For Gemini, would "Gemini-2.0-Exp" count, or does it have to be "Gemini-2.0" without the "-Exp" marker?

@MP Friendly ping! Would be lovely to get a clarification on the resolution criteria.

Comment hidden