What is Grok 4 Heavy's performance on METR's task length evaluation? | Manifold

What is Grok 4 Heavy's performance on METR's task length evaluation?

16

Ṁ895

Jan 1

1D

1W

1M

ALL

3%

0 to 1.5 Hours

47%

1.5 to 2 Hours

36%

2 to 2.5 Hours

11%

2.5 to 3 Hours

3%

More than 3 Hours

Resolves based on the METR's measurement of the duration of tasks that can complete with a 50% success rate.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Grok 4 Market here:

https://manifold.markets/AffineTyped/what-is-grok-4s-performance-on-metr

Get

1,000

and

1.00

Sort by:

bought Ṁ20 1.5 to 2 Hours NO

They’re not gonna measure it i reckon

Related questions

Will GPT-5.1 have a longer METR time horizon than Gemini 3?

Grok 4.20's METR 50% time horizon

Grok 5's METR 50% time horizon

When will Grok 3 weights become publicly available?

How many parameters does Grok 3 have?

Opus 4.5's METR time horizon beats GPT-5.1's?

Grok 4.2 (xAI) release date

Related questions

Will GPT-5.1 have a longer METR time horizon than Gemini 3?

How many parameters does Grok 3 have?

Grok 4.20's METR 50% time horizon

Opus 4.5's METR time horizon beats GPT-5.1's?

Grok 5's METR 50% time horizon

Grok 4.2 (xAI) release date

When will Grok 3 weights become publicly available?

Terms & Conditions•Privacy Policy•Sweepstakes Rules