How much cheaper to use will o3-equivalent or better models get before 2026?
➕
Plus
16
Ṁ1784
Dec 31
88%
≥2x
65%
≥5x
60%
≥10x
39%
≥30x
23%
≥100x

Any model with publicly known benchmark scores and inference costs goes, not just OpenAI's o series.

I will consider a model to be "o3-equivalent or better" if it scores ≥25% on FrontierMath (o3 scored 25.2%) and performs similarly on other benchmarks.

(Note that o3's exact inference costs in the configuration used for benchmarking are currently unknown IIUC, though this market description will be updated with exact figures if they become public. This market can still resolve even without exact figures if e.g. OpenAI announce an o4 that's "10x cheaper" for roughly the same performance.)

Get
Ṁ1,000
and
S1.00
Sort by:

this may be hard to resolve because the inference costs for specific benchmark performances or tasks can vary so much.