Will AI be capable of producing an Annals-quality math paper for $100k by March 2030?
123
Ṁ150k
2030
49%
chance

I (Tamay Besiroglu) bet the mathematician Daniel Litt that the best AI models in March of 2030 would be capable of generating math papers in Number Theory at the level of quality of papers published in Annals today (i.e. 2025). https://x.com/tamaybes/status/1899262088369106953?s=46

The AI would receive no detailed guidance relevant to the mathematics research, and is required to accomplish this task autonomously.

The AI system(s) are granted a total budget of $100k in inference compute per paper.

This bet would be resolved on the basis of Daniel Litt’s judgment.

  • Update 2025-03-21 (PST) (AI summary of creator comment): Novel Research Requirement Clarification:

    • For a YES resolution, the AI must perform novel research autonomously, not just produce a paper that could pass as research.

  • Update 2025-03-23 (PST): - Budget Currency: The $100k inference compute budget is expressed in nominal dollars (current currency) with no inflation adjustment. (AI summary of creator comment)

  • Update 2025-05-17 (PST) (AI summary of creator comment): The creator endorsed an interpretation (via a previously posted ChatGPT response to a user's question) regarding the market's resolution. This endorsement suggests:

    • The market generally requires demonstrating repeatable capability in generating Annals-quality math papers.

    • However, a single, exceptionally significant autonomous achievement by an AI (such as proving the Riemann hypothesis) before 2030 would also be considered sufficient for a YES resolution.

Get
Ṁ1,000
and
S1.00
Sort by:

I think AlphaEvolve is positive news for this market.

I think one possible way of producing qualifying result is first finding (in similar fashion to AlphaEvolve) some construction that was previously considered to be likely impossible. And then writing a paper about implications of this new (not sufficient by itself, likely too mechanistic and not very insightful) construction on various things.

I am not mathematician so I am not sure if my idea makes sense. But in any case I feel like AlphaEvolve is progress towards YES.

sold Ṁ17 NO

@qumeric Definitely is a progress but the approach feels fundamentally expensive. 2030 is a long way away though.

@TamayBesiroglu Is this bet about a single occurrence, i.e. does a single paper having published resolve this market YES, or does it need to repeatable (and have been shown to be repeatable)? I suppose it's the latter since you wrote 'capable of generating math papers' (plural).

@CalibratedNeutral Here's ChatGPT's answer to your question based on its reading of the terms.

@TamayBesiroglu You are the market creator, so why are you posting a ChatGPT answer?

@CalibratedNeutral I understand it to be about a capabilities, so repeatable. The bet is meant to be a proxy for “do the best models have absolute advantage over top mathematicians in doing math research.” That said, if an AI autonomously proves the Riemann hypothesis or something before 2030 I think you should expect the market to resolve YES.

@DanielLittQCSn That is reasonable. Thank you

@CalibratedNeutral ChatGPT is good at reading comprehension.

I'm a bit worried about a YES resolution that just constructs a solution to some diophantine equation or something.

Is the $100k for the inference budget in today's currency value? The currency should also be stated, e.g., USD.

@TomBurns nominal dollars

It is very entertaining to see the opinions of all the people who think this won't be possible by 2030

Is it correct that YES resolution also requires the AI to do novel research, not just write a paper that could pass as research?

@zsig Yes

bought Ṁ1,000 NO

80% is completely absurd. I am sure that AI will be capable of writing parts of such papers by then, but currently models cannot even write correct but essentially trivial papers.

opened a Ṁ2,500 YES at 70% order

@LocalGlobal 10K at 70%?

@LocalGlobal This feels to me a bit like "Those sea lions that were trained to build motorcycles a few years back can source parts and draft designs, sure, but their welding technique is still terrible!"

Without a deeper model of what drives AI progress, the phrase "currently models cannot" is just not predictive of anything.

bought Ṁ10 YES

This particular example is an indictment of the state of peer-reviews and scholarship in ML more than anything else

@jgyou I agree with this. The paper should not have passed the first round of peer review. I was most impressed by the fact that it could autonomously construct such a paper, graphics and all. This is another case where the exaggerated claim serves as a good prediction of near-future capabilities.

5 years is a very long time, even if the exponential improvement we have consistently seen for the past 6 years doesn't hold the whole way through.

@Haiku I think these responses might just be overestimating the standards that are expected of a workshop paper.

bought Ṁ50 YES

@LocalGlobal 5 years ago LLMs could not even write a full sentence.

@SimoneRomeo 5 years ago was 2020 - year when GPT-3 came out. We had models writing poems in certain style in 2016 i believe.

@mathvc if it was the beginning of 2020, right before the release of gpt3, would you think that 5 years later a model would be able to write this:

https://chatgpt.com/share/680b6c9d-d638-8012-bc32-0cf36cae46eb

@SimoneRomeo I would not be surprised no. Most of the text is essentially copied from the training data and the rest is pure bluffing. For example, part 4 of proof sketches is nonsense, there is by definition no Siegel zero in a fixed finite range of moduli. There is certainly much more nonsense but don't want to bother checking. I have seen much more impressive LLM output tbh.

Also, it does not even look like a math paper, the introduction, background, main results, future work layout is quite common in experimental sciences but is never used in (pure) mathematics. There are also no actual proofs, which are the cactual content of any math paper. If this was written by a human just this would be enough to conclude it is obvious crankery. This is almost certainly fixable with better prompting and/or scale but it shows some issues.

The Sakana example actually increased my odds since it wrote a coherent multi-page technical text which is the first time I've seen something like this.

I stay with my belief that 80 % is crazy high. I guess the people betting like this think a singularity is very likely.