Has AI surpassed technical but non-mathematician humans at math?

Ṁ890

Jan 11

51%

chance

ALL

I'm going to make this about me, and not bet in this market.

This is like my superhuman math market but with a much lower bar. Instead of needing to solve any math problem a team of Fields medalists can solve, the AI just needs to be able to solve any math problem I personally can solve.

And I'm further operationalizing that as follows. By January 10, will any commenter be able to pose a math problem that the frontier models fail to give the right answer to but that I can solve. If so, this resolves to NO. If, as I currently suspect, no such math problems can be found, it resolves YES.

(In case it helps calibrate, I have an undergrad math/CS degree and a PhD in algorithmic game theory and I do math for fun but am emphatically not a mathematician and am pretty average at it compared to my hypernerd non-mathematician friends. I think I'm a decent benchmark to use for the spirit of the question we're asking here.)

FAQ

1. Which frontier models exactly?

Whatever's available on the mid-level paid plans from OpenAI, Anthropic, and Google DeepMind. Currently that's GPT-5.2-Thinking, Claude Opus 4.5, and Gemini 3 Pro.

2. What if only one frontier model gets it?

That suffices.

3. Is the AI allowed to search the web?

TBD. When posing the problems I plan to tell the AI not to search the web. I believe it's reliable in not secretly doing so but we can talk about either (a) how to be more sure about that or (b) decide that that's fair game and we just need to find ungooglable problems.

4. What if the AI is super dumb but I happen to be even dumber?

I'm allowed to get hints from humans and even use AI myself. I'll use my judgment on whether my human brain meaningfully contributed to getting the right answer and whether I believe I would've gotten there on my own with about two full days of work. If so, it counts as human victory if I get there but the AIs didn't.

5. Does the AI have to one-shot it?

Yes, even if all it takes is an "are you sure?" to nudge the AI into giving the right answer, that doesn't count. Unless...

6. What if the AI needs a nudge that I also need?

This is implied by FAQ4 but if I'm certain that I would've given the same wrong answer as the AI, then the AI needing the same nudge as me means I don't count as having bested it on that problem.

7. Does it count if I beat the AI for non-math reasons?

For example, maybe the problem involves a diagram in crayon that the AI fails to parse correctly. This would not count. The problem can include diagrams but they have to be given cleanly.

8. Can the AI use tools like writing and running code?

Yes, since we're not asking about LLMs specifically, it makes sense to count those tools as part of the AI.

(I'll add to the FAQ as more clarifying questions are asked.)

Related Markets

https://manifold.markets/dreev/superhuman-mathematical-problem-sol

[ignore auto-generated clarifications below this line; nothing's official till I add it to the FAQ]

Get

1,000

and

1.00

9 Comments

Sort by:

Didn't GPT-5 fail to get that super easy bagel-splitting question right? Does that count for this market?

Are calculators better than humans at math? It's probably the same answer as that.

Problems can be presented informally, correct? Are you allowed to search the internet yourself? Are physics problems allowed? Process optimization questions?

The spec doesn’t match the title very well. If AI is can solve 1000 things that you can’t, and you can’t solve 1 thing AI can’t, then AI is better at math than you but this bet resolves to No.

@AlexRosence5a Hmm, yeah, got ideas for a better title? Something along the lines of "does ai pareto-dominate stem people at math?" maybe?

Or more direct: will we find a math problem i can solve that ai can't?

bought Ṁ50 NO

Is it allowed to use code interpreter/calculator tools?

@spiderduckpig Yes, I think it makes sense to treat that as allowed. This isn't asking about LLMs specifically. So those tools count as part of the AI.

Does vision count? eg a super crappy picture of a beginner sudoku, or kid's drawing of a houses/utilities topology puzzle variant might be too hard for their multi-modality to deal with, but this seems a bit cheap. Maybe math problems must be submitted in text form?

@DZC Yes, I'd like to call that too cheap. Great to clarify though!

And I don't think the math problems should have to be pure text. Just written up cleanly.

FAQ