What is Grok 4 Heavy's performance on METR's task length evaluation?
16
Ṁ895Jan 1
1D
1W
1M
ALL
3%
0 to 1.5 Hours
47%
1.5 to 2 Hours
36%
2 to 2.5 Hours
11%
2.5 to 3 Hours
3%
More than 3 Hours
Resolves based on the METR's measurement of the duration of tasks that can complete with a 50% success rate.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Grok 4 Market here:
https://manifold.markets/AffineTyped/what-is-grok-4s-performance-on-metr
Get
1,000and
1.00
Related questions
Related questions
Will GPT-5.1 have a longer METR time horizon than Gemini 3?
21% chance
How many parameters does Grok 3 have?
Grok 4.20's METR 50% time horizon
Opus 4.5's METR time horizon beats GPT-5.1's?
80% chance
Grok 5's METR 50% time horizon
Grok 4.2 (xAI) release date
When will Grok 3 weights become publicly available?
-