
Question is about any current or future openAI models vs any competitor models.
If a language model exists that is undoubtedly the most accurate, reliable, capable, and powerful, that model will win. If there is dispute as to which is more powerful, a significant popularity/accessibility advantage will decide the winner. There must be public access for it to be eligible.
See previous market for more insight into my resolution plan: /Gen/will-there-be-an-ai-language-model
2024 recap: capabilities were "similar". Both Google and openAI models tied for first place on LLM Arena. OpenAI won because of their popularity/market dominance.โ
Update 2025-11-27 (PST) (AI summary of creator comment): Creator will not resolve based on current Gemini 3 lead alone. Will wait until end of year to allow for:
Potential new OpenAI model releases
Further discussion on whether the lead is "strong enough"
Assessment of whether there is dispute about which model is more powerful
Creator leans toward YES if OpenAI releases no new models by end of year.
Update 2025-11-27 (PST) (AI summary of creator comment): Creator will not resolve early despite Gemini 3.0's current lead. The bar for early resolution is higher than the bar for determining a winner at end-of-year assessment. Creator still leans YES but will wait before resolving.
Update 2025-12-06 (PST) (AI summary of creator comment): Creator distinguishes between early resolution criteria vs end-of-year resolution criteria:
Early resolution requires a model that is so obviously better it takes a huge chunk of market share from ChatGPT (which still has ~80% market share)
End-of-year resolution (Dec 31) will be based on whatever is the best model, with popularity only acting as a tie-breaker rather than a necessary component
Creator acknowledges Gemini/Claude dominate ChatGPT for top-end use, but notes most people either don't know or don't care that they are better.
Update 2025-12-08 (PST) (AI summary of creator comment): Title updated to reflect end-of-year resolution: "At the end of 2025"
Current assessment: Creator believes Gemini and Claude are sufficiently ahead of ChatGPT based on all metrics.
Resolution plan:
Market will resolve YES unless OpenAI releases a new model that top scores before year end
Not resolving early to give OpenAI time to release a potential new model
Market is meant to measure if OpenAI has been "strongly surpassed" - would be inappropriate to resolve YES if OpenAI releases a superior model (e.g., GPT-6) shortly after, as that would indicate they were only "beat to release" rather than truly surpassed
Evidence considered: Benchmarks, stock market activity, and OpenAI's "code red" all indicate OpenAI knows they are no longer the leader
Update 2025-12-08 (PST) (AI summary of creator comment): Creator clarifies the "strongly surpassed" criterion:
Market will resolve YES unless OpenAI releases a new public model before year-end that demonstrates they were never really "strongly surpassed"
If OpenAI was merely "beat to release" but maintained their lead behind the scenes, this would resolve NO
There is no specific metric (absolute or relative) for determining "strongly surpassed"
The current lead by competitors (Gemini/Claude) is sufficient to resolve YES if OpenAI cannot produce evidence of having something better in development
Example: If a competitor releases something only 0.1% better and OpenAI releases a superior model shortly after, OpenAI was never truly "strongly surpassed" - they were just beat to release
Update 2025-12-11 (PST) (AI summary of creator comment): For OpenAI's o5.2 (or any new OpenAI model) to affect resolution:
Must be publicly released
Must be independently tested/scored
Must be at least top scoring to show OpenAI wasn't strongly surpassed
If new OpenAI model is only barely best in class, resolution may be more complicated. Otherwise, market resolves YES if these conditions aren't met.
Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies how GPT-5.2 (or any new OpenAI model) will affect resolution:
Since Gemini 3.0 has been out for a while and 5.2 is a "catch up model," if Gemini is ahead on benchmarks, this shows Gemini strongly surpassed OpenAI when it was released
If Google releases an update same day that beats GPT-5.2, that would also count
At year end, if there is a clear non-OpenAI leader, market resolves YES
If GPT-5.2 is a really close number 1 model where nobody can determine which is better, creator will probably still resolve YES
Creator notes the "strongly surpassed" language is meant to avoid situations where OpenAI has achieved AGI/ASI but loses due to release schedules. Since OpenAI has shown signs they're worried (code red, etc.), if the new model isn't a top scorer, they know OpenAI was passed.
Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies the "strongly surpassed" criterion:
OpenAI must prove they weren't surpassed behind the scenes (i.e., their unreleased models are better than competitors' new releases)
If OpenAI releases a new model that isn't a top scorer, this proves they were surpassed
The "strongly" qualifier means definitely knowing OpenAI was passed, not just tied for first or ahead behind the scenes but withholding for a big release
Last year resolved in OpenAI's favor despite Claude having a few point lead because OpenAI hadn't released in a while
This time there will be a clear picture of OpenAI's best most recent capabilities - if it's not top tier, they were clearly surpassed
Update 2025-12-11 (PST) (AI summary of creator comment): Creator clarifies evaluation approach for GPT-5.2 (or any new OpenAI model):
Will grade OpenAI harder since they have the most recent release and most time to tune
Evaluation is holistic but only on language capabilities (no image/video/etc.)
No specific weights to any particular benchmarks
If experts/industry leaders consensus says GPT-5.2 is inferior to Gemini/Claude, market can resolve YES even if GPT-5.2 is comparable on lmarena
Update 2025-12-12 (PST) (AI summary of creator comment): Creator clarifies that GPT-5.2's higher rank on WebDev alone is not sufficient to resolve NO. The model is not #1 on the relevant leaderboard at this time.
@SqrtMinusOne OpenAI is no longer the dominant model at the top. It has been strongly exceeded by many other models, and thereโs no realistic way it will regain dominance over all other models before December 31, 2025.
@FaroukRice it needs to be publicly released, independently tested/scored, and be good enough to show that they werenโt strongly surpassed (I.e. it needs to be at least top scoring)
Otherwise we still resolve YES. If itโs kind of good but only barely best in class, things might get a little more complicated.. hopefully itโs either very impressive or hot trash
@Gen Is there any outcome where ChatGPT models are surpassed, but it doesnโt meet the threshold of โstrongly surpassedโ? Or is any surpassing sufficient?
Ex: If Gemini 3.0 is very slightly better than 5.2 on some benchmarks but tied on others, is this sufficient for โstrongly surpassedโ and a YES?
@Gen Related, how possible is a 50/50 if Gemini 3 was obviously better than GPT 5/5/1 but 5.2 is rather close?
@Panfilo my interpretation of the resolution criteria that doesnโt matter since we are looking at any sufficient openAI model ie right now 5.2.
@FaroukRice because Gemini has been out for a while and 5.2 is a catch up model, if Gemini is ahead on benchmarks that will show to me that when Gemini was released it strongly surpassed openAI (was largely indisputably better than any model OpenAI were developing for public release)
if Google released an update same day that was ahead of GPT5.2, that would count. At the end of the year, if there is a clear non openAI leader then this market will resolve YES.
The โstrongly surpassedโ encapsulates a bunch of weird rules from last year where we resolved NO even though Claude was ahead on benchmarks by 1-2pts because openAI were still leaders behind the scenes. Iโm trying to be as precise as possible about real outcomes rather than explaining the language at this point, but thereโs a lot you can go back and read if you want. One of the guiding principles has always been to avoid a situation where openAI have essentially achieved AGI/ASI but this resolves against them because of release schedules. They have basically done everything we thought they could (code red, etc) to indicate theyโre worried, so if the new one isnโt a top scorer - we know 100% they were passed.
If itโs a really close number 1 model and nobody can really determine which is better, Iโll probably still resolve YES .. I hope people can follow the reasoning
Happy to continue to discuss, I recommend not making huge bets if youโre not comfortable with how Iโm explaining things
@Bayesian yeah I shouldnโt have written a half baked reply, my bad
Bottom line is, at this point openAI have to prove they werenโt surpassed behind the scenes (that is, their unreleased models are better than the new releases by competitors). If they release something that isnโt a top scorer, thatโs obviously not the case.
Last year it resolved in their favour despite Claude or whoever having a few pt lead because openAI hadnโt released in a while. This time we will have a clear picture of their best most recent capabilities and if itโs not top top then clearly they were surpassed
This is where the โstronglyโ comes from, it is more about definitely knowing they were passed, and not just tied for first, or ahead behind the scenes but withholding for a big release
@Gen In this case, Gemini 3 and Opus 4.5 were both released within the past few weeks. If GPT-5.2 scores roughly similar to both of them, would this count as other labs surpassing or merely catching up with OpenAI?
I also think all of these models will outperform the other two in certain tasks. Is your evaluation holistic or based off a few key benchmarks (ex: lmarena)?
@SolarFlare โroughly similarโ is hard to say, because itโll depend what that looks like. Iโm inclined to go harder when grading openAI because they have the most recent release and the most time to tune things
The evaluation is holistic but only on language capabilities (no image/video/etc.). No specific weights to any particular benchmarks. If all of the experts (or, all of the people on the TIME cover) held hands and said GPT5.2 was trash, and that Gemini/Claude have it beat, then it can still resolve YES even if itโs comparable on lmarena
Market clarification, you should read this:
The title is honestly not well written at this point, we are following a bunch of rules defined in the previous year market. The important ones are:
This market will resolve YES if there is an obvious better chatbot at the end of the year, or
it could have early resolved YES if there was an unambiguous non-openAI leader prior to the end of the year (but there was discussion about giving openAI time to release a retaliation model, provided it happens in the same year)
This is what we are following right now. The year is basically over, so I'm updating the title to say "At the end of 2025".
I believe that Gemini and Claude are sufficiently ahead of ChatGPT. All metrics point to this being the case. I expect that unless openAI release a new model which top scores, it should resolve YES.
If this is highly contested I am happy to discuss it, but I think it's very reasonable given the prior year criteria which we follow, and the only reason I am not resolving it early is because there is still time for openAI to prove that they are "top of class". It would be extremely cringe for this market to resolve based on a temporary lead by competitors, when it is supposed to measure the fall of openAI as a leader. If openAI released GPT6 next week and crushed benchmarks, it wouldn't make sense to say they were "strongly surpassed" for this transient period between releases.
Among benchmarks, stock market activity, and the "code red", it seems clear that openAI knows they are no longer the leader, and this market will resolve YES unless they produce something (a new public model) that indicates they were never really "strongly surpassed", i.e. they were beat to release, but behind the scenes, they maintained their lead.
@Gen have early resolved YES if there was an unambiguous non-openAI leader - that would be better (my opinion)



