What will be the best score on the GAIA benchmark before 2025? | Manifold

What will be the best score on the GAIA benchmark before 2025?

Mini

6

Ṁ77

Jan 1

46%

chance

1D

1W

1M

ALL

This question will resolve as the state-of-the-art average score on the GAIA benchmark (on the test set, not validation set) by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. The primary credible source will be the official leaderboard, but other sources, including but not limited to arXiv preprints and papers may also be considered.

Background Information:

See GAIA,
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). (See our paper for more details.) GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
Best score on March 15th 2024 is GPT-4-turbo based and achieved 32.33%.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

#️ Technology

#Technical AI Timelines

Get

1,000

and

1.00

Related questions

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score on the WebArena benchmark before 2025?

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

What will be the best score on the InterCode (Bash) benchmark before 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

-5% 1d51% chance

What will be the best score on the GPQA benchmark before 2025?

Will an AI agent system be able to score at least 40% on level 3 tasks in the GAIA benchmark before 2025.

Will an AI score over 10% on FrontierMath Benchmark in 2025

What will the top score on Humanity's Last Exam be when it is released?

Related questions

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

What will be the best score on Cybench by December 31st 2025?

What will be the best score on the GPQA benchmark before 2025?

What will be the best score on the WebArena benchmark before 2025?

Will an AI agent system be able to score at least 40% on level 3 tasks in the GAIA benchmark before 2025.

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

Will an AI score over 10% on FrontierMath Benchmark in 2025

What will be the best score on the InterCode (Bash) benchmark before 2025?

What will the top score on Humanity's Last Exam be when it is released?

Terms & Conditions•Privacy Policy•Sweepstakes Rules