Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

Mini

Ṁ1841

Jan 2

63%

chance

ALL

Background: Project Euler is a series of challenging mathematical/computer programming problems intended for computational problem-solving using computer algorithms. Each problem requires more than just mathematical insights to solve; it often requires the design and implementation of efficient algorithms. This benchmark assesses the problem-solving capabilities of LLMs in fields requiring high levels of mathematical and algorithmic understanding.

Question: Will the next major release of an OpenAI LLM solve more than 2 of the first 5 new Project Euler problems released after the model’s official public debut?

Resolution Criteria: For this question, the "next major release of an OpenAI LLM" is defined as the next model from OpenAI that satisfies at least one of the following criteria:

It is consistently called "GPT-4.5" or "GPT-5" by OpenAI staff members
It is estimated to have been trained using more than 10^26 FLOP according to a credible source.
It is considered to be the successor to GPT-4 according to more than 70% of my Twitter followers, as revealed by a Twitter poll (if one is taken).

This question will resolve to "YES" if this LLM successfully solves more than 2 of the first 5 Project Euler problems released post its launch, according to the first single public document or comment describing an attempt to get the LLM to solve each of these problems as follows. For each problem, the LLM will be allowed up to three attempts to provide a correct solution, with a total time limit of 3 hours of computational 'thinking' across all attempts. Network errors resulting in partial responses will not be counted. The resolution will rely on public documentation or a credible report detailing the performance of the model on these specific problems.

#OpenAI

#LLMs

#Project Euler

Get

1,000

and

1.00

1 Comment

Sort by:

So far GPT-5 has solved around half of the most recent problems: MathArena.

But all of them were released before the release of GPT-5, so none of them count for this yet.

Related questions

Related questions