What will be the best score on the SWE-Bench (unassisted) benchmark before 2025? | Manifold

What will be the best score on the SWE-Bench (unassisted) benchmark before 2025?

Mini

9

Ṁ331

resolved Jan 29

Resolved as

29%

1D

1W

1M

ALL

This question will resolve as the state-of-the-art accuracy on the SWE-Bench unassisted benchmark by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. The primary credible source will be the official leaderboard, but other sources, including but not limited to arXiv preprints and papers, may also be considered.

Background information:

See SWE-bench.

SWE-bench is a dataset that tests systems' ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. Read more about SWE-bench in our paper!
Best reported system on March 15th 2024 is Devin achieving 13.86%. The official best on the official leaderboard is Claude 2 + BM25 Retrieval with 1.96%.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

#️ Technology

#Technical AI Timelines

Get

1,000

and

1.00

Related questions

What will be the best performance on SWE-bench Verified by December 31st 2025?

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Pro public dataset score by January 1, 2026

Top SWE-Bench Pro score by Jan 1, 2027?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

What will be the highest score achieved on SWE-Bench Verified in 2025?

Top SWE-Bench Verified score in 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Related questions

What will be the best performance on SWE-bench Verified by December 31st 2025?

What will be the highest score achieved on SWE-Bench Verified in 2025?

Top Multi-SWE-bench score in 2025?

Top SWE-Bench Verified score in 2025?

Top SWE-Bench Pro public dataset score by January 1, 2026

What will be the best score on Cybench by December 31st 2025?

Top SWE-Bench Pro score by Jan 1, 2027?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Terms & Conditions•Privacy Policy•Sweepstakes Rules