What will be the best performance on FrontierMath Tier 4 by December 31st 2025?
4
Ṁ393
Dec 31
11%
0% - 10%
35%
10 - 20%
28%
20 - 30%
6%
30 - 40%
4%
40 - 50%
3%
50 - 60%
3%
60 - 70%
3%
70 - 80%
3%
80 - 90%
3%
90 - 100%

The best performance by an AI system on FrontierMath Tier 4 as of December 31st 2025. See https://epoch.ai/frontiermath, under the section Tier 4, for results accepted for the purpose of this market. The "performance" is measured in terms of Pass@1 Accuracy.

At market creation (and day of the official announcement of the benchmark), the best model is o4-mini (high), with a score of 6.25%.

See also best performance on FrontierMath Tier 1-3:

Get
Ṁ1,000
and
S1.00
Sort by:

My intuition is that you can't get much better results on this benchmark just by scaling current methods and that no one will implement a new method before end of 2025.

I think there's still a fair bit it can be pushed but I agree probably no one will implement a new method before end of 2025. do you think the method that first gets 30% will use a new method?

@Bayesian Yeah. But I didn't do too much research on the questions I just know they are unique from trainable datasets, and require a lot of reasoning steps. I think we need a new method that will help AIs better generalize their learnings and skills from different domains.