MANIFOLD
BrowseUS ElectionNewsAbout
What is Grok 4 Heavy's performance on METR's task length evaluation?
7
Ṁ120
2026
1D
1W
1M
ALL
39%
0 to 1.5 Hours
20%
1.5 to 2 Hours
19%
2 to 2.5 Hours
16%
2.5 to 3 Hours
6%
More than 3 Hours

Resolves based on the METR's measurement of the duration of tasks that can complete with a 50% success rate.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Grok 4 Market here:

https://manifold.markets/AffineTyped/what-is-grok-4s-performance-on-metr

#Grok
Get
Ṁ1,000
and
S1.00
Comments

Related questions

How well will Grok 4 do on Frontier Math?
-
Will Grok 4 achieve over 69% on SimpleBench
-20% 1d23% chance
Open-source OpenAI model beats Grok 4 on LMArena?
6% chance
Will Grok 4 Top the Chatbot Leaderboard?
-9% 1d27% chance
What is Grok 4's performance on METR's task length evaluation?
Grok 4 before 2026?
99% chance

Related questions

How well will Grok 4 do on Frontier Math?
-
Will Grok 4 Top the Chatbot Leaderboard?
27% chance
Will Grok 4 achieve over 69% on SimpleBench
23% chance
What is Grok 4's performance on METR's task length evaluation?
Open-source OpenAI model beats Grok 4 on LMArena?
6% chance
Grok 4 before 2026?
99% chance
Terms & Conditions•Privacy Policy•Sweepstakes Rules
BrowseElectionNewsAbout