Will AI pass the Winograd schema challenge by the end of 2025?
➕
Plus
30
Ṁ1004
2026
85%
chance

https://en.wikipedia.org/wiki/Winograd_schema_challenge

Resolves positivly if a computer program exists that can solve Winograd schemas as well as an educated, fluent-in-English human can.

Press releases making such a claim do not count; the system must be subjected to adversarial testing and succeed.

(Failures on sentences that a human would also consider ambiguous will not prevent this market from resolving positivly.)

/IsaacKing/will-ai-pass-the-winograd-schema-ch

/IsaacKing/will-ai-pass-the-winograd-schema-ch-1d7f8b4ad30e

/IsaacKing/will-ai-pass-the-winograd-schema-ch-35f9dca7fa7d

/IsaacKing/will-ai-pass-the-winograd-schema-ch-d574a4067e75

Get
Ṁ1,000
and
S1.00
Sort by:

So roughly >=97% on Winograd? Also this is Winograd and not Winogrande, right?

predicts NO
  1. I think a human could do better than 97%? If AI seems to be right on the boundary, I'll use myself as a benchmark. If it does at least as well as me, the market resolves YES. (I get to look up any words I don't know the definitions of, but I don't get to look up anything else.)

  1. Yes.

predicts YES

@IsaacKing Wikipedia reports 94-96 % for humans, the Winogrande paper reports 97%

predicts NO

@SneakySly That market appears to be about the Winoground test for image models. This market is about the Winograd test for language models. They're entirely different things, they just have a similar name. (I assume Winoground was a pun based on Winograd.)

@IsaacKing Ahhh, my mistake!