Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?
12
Ṁ395Dec 31
12%
chance
1D
1W
1M
ALL
This market predicts whether Anthropic's next-generation Claude 4 model will achieve a score exceeding 95% on the MMLU-Pro benchmark before December 31, 2025. MMLU-Pro is an enhanced version of the Massive Multitask Language Understanding benchmark, which tests AI models on multiple-choice questions across various subjects. As of April 2025, Claude 3.7 Sonnet has achieved around 83% on MMLU-Pro, while the current record holder (OpenAI's o1) scores just over 90% on standard MMLU. A score above 95% would represent a significant breakthrough in AI capabilities, potentially surpassing average human expert performance on these tests.
Get
1,000and
1.00
Related questions
Related questions
Will Claude become a Pokèmon Master by the end of 2025?
60% chance
Will Claude 3.5 Opus be available via API by end of 2025?
4% chance
Will an open-source LLM under 10B parameters surpass Claude 3.5 Haiku by EOY 2025?
99% chance
Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?
9% chance
Will Claude MCP have equivalent functionality to a Claude Computer Use module by EOY2025?
57% chance
Will a text model achieve 100% performance on the MMLU in five years?
28% chance
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
6% chance
Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?
77% chance
MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?
44% chance
MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?
8% chance