Will AIs beat human experts in question-answering on the GPQA benchmark before January 1st, 2027? | Manifold

Will AIs beat human experts in question-answering on the GPQA benchmark before January 1st, 2027?

Plus

30

Ṁ4336

2027

95%

chance

1D

1W

1M

ALL

From the abstract,

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy.

This question resolves to YES if a credible paper, blog post, or document of any kind indicates that at least some AI obtained a score of greater than 74.0% on the GPQA dataset before January 1st 2027, and NO otherwise. The result must be credible, and I will exclude results that appear to be the result of cheating: for example, results obtained by training on the test set.

#️ Technology

#Technical AI Timelines

Get

1,000

and

1.00

Sort by:

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will AI pass the Winograd schema challenge by the end of 2025?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

Will AI beat top Magic the Gathering human player before the end of 2026?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will AI pass the Winograd schema challenge by the end of 2025?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

Will AI beat top Magic the Gathering human player before the end of 2026?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Terms & Conditions•Privacy Policy•Sweepstakes Rules