Will an AI be able to speedrun any popular video game faster than the human WR by the end of 2024?
➕
Plus
232
Ṁ26k
Dec 31
12%
chance

This question resolves to "YES" if an AI agent has learned to speedrun at least one popular category (≥100 unique runners on speedrun.com or another leaderboard) of any video game released before 2022, and has finished at least one run with a better time than any human speedrunner at the time.

Native PC or emulated console games are both fine.

Criteria for resolution:

  • The AI must be capable of speedrunning the game in real time and learn to do so without direct human assistance (learning from e.g. Youtube videos is fine). A traditional TAS (tool-assisted speedrun) does not count.

  • The AI must not receive any information about the game that a human speedrunner wouldn't be able to know during a run, e.g. watching RAM values while playing. Ideally the AI should only receive game's pixels (possibly downscaled or otherwise processed) and maybe audio as input, but this is not a strict requirement.

  • The AI should ideally follow all the rules of the specific game and category it is running, as listed on the game's speedrun.com page (or elsewhere). If there are minor rule breaks, but the AI's run is still obviously much more impressive than the most comparable human WR, I may choose to ignore this requirement.

  • It must be a full-game, non-segmented run, not an IL (individual level) speedrun.

  • The human world record in the category must be over 3 minutes long. Very short speedruns don't count.

I will use my best judgment to resolve this based on the criteria above and will not bet myself.

Get
Ṁ1,000
and
S1.00
Sort by:

@traders I've made a multiple choice version of this question with different years here:
https://manifold.markets/NoUsernameSelected/when-will-an-ai-be-able-to-speedrun

@traders I've made a multiple choice version of this question with different years here:
https://manifold.markets/NoUsernameSelected/when-will-an-ai-be-able-to-speedrun

What counts as direct human assistance?

@Celene Broadly speaking it can learn from human-generated data during training, but otherwise plays autonomously at run time.

sold Ṁ9 NO

@NoUsernameSelected would "reading a text file of inputs" count? What about "being trained to memorize all the necessary frame perfect inputs?"

Btw if anyone knows projects that might qualify for this question (even if imperfectly with all the resolution minutia), do feel free to drop them in the comments here. I'd be very interested to see if anything comes close.

Comment hidden

I think the ideal game to do that with the constraints that it has more than 100 runners on SRC, is listed as a full game run, and has a WR above 3 minutes) would be Cookie Clicker. For reasons that should be pretty obvious.

But I don't think anyone will actually do this, and it won't be feasible for 90% of games in this timespan.

Seems like:
- This would take quite a lot of effort to train, so there is a good chance nobody even makes the attempt.
- It would be easier to beat the top speedrun of a less-popular game, but regardless, announcing "AI has beaten the human world record for game X" would immediately attract tons of human competition from the speedrunning community. So, even if the AI beats the current human record, humans might very likely win it back and remain the reigning champions by the end of 2024.
- If the AI developed some amazing new technique to skip levels or etc, then humans could probably just copy that technique. So I am thinking that in order to have a durable edge, the AI must be capable of faster inputs / TAS-level precision / etc, something that humans couldn't imitate. Personally I am doubtful that modern AI systems are consistent enough that they could be more precise than a dedicated human speedrunner.

@JacksonWagner The criteria as written only seem to require that the run is faster than any human runs at the time. So would still resolve YES even if a human run subsequently beats the AI run.

@NLeseul Can confirm that's how I intended to resolve.

@NoUsernameSelected Might be a good idea to edit the title, in that case: "Will an AI have broken a human WR" would be more accurate than "Will an AI be able to speedrun faster than the human WR by end of 2024"

predicts YES

@JacksonWagner That already seemed like the natural reading to me, fwiw. Plus the description says "better time than any human speedrunner at the time." (emphasis added)

predicts NO

@jack I agree that I failed to closely read the description. But IMO the title is misleading.

"Will Trump lead Biden in the polls by election day?", "Will gas cars outsell electric cars by 2050?", etc --> seems to imply that what matters is performance at the end of the time period.

"Will Trump lead Biden in the polls at any time between now and Nov 2024?", "Will gas cars outsell electric cars in any year of the next 25 years?" --> very different question

predicts YES

@JacksonWagner No, "by" means anytime before the end of the time period.

https://www.merriam-webster.com/dictionary/by

: not later than

be there by 2 p.m.

predicts NO

If someone does this with deep RL, they have to define a reward function, which is usually specific to the particular game.

Obviously a reward function like "how closely does your sequence of inputs match the existing TAS?" is cheating, but a reward function like "if you beat the whole game, you get an amount of reward depending on how fast you completed it, no other feedback" seems impossibly sparse.

There's probably a lot of shades of grey as to whether particular reward functions in the spectrum between those extremes count as "direct human assistance".

predicts YES

@Multicore I think using some game with lot of small screens (like Katana Zero) and giving reward after each screen sounds possible to me and I don't think that counts as direct human assistance

Tangentially related:

predicts YES

this is a trivial yes, the criteria are too weak to filter out an AI that can generate TAS grade runs.

@L The criteria weren't supposed to filter out TAS grade runs? I just want runs that can be created automatically (as opposed to painstakingly by hand like in a regular TAS) and be executable in real time (being robust to differences in RNG that might happen from run to run).

They wouldn't quite be "TAS grade" in the sense of being the fastest possible run with the best fixed RNG seed, just better than what the best humans can do.

@NoUsernameSelected If an AI can learn from YouTube, I wonder if it could potentially just copy exactly a TAS it finds there (assuming no RNG issues in that particular game). Seems like that would fit the requirements. I guess it's still non-trivial to implement, but probably doable.

For what it's worth, I believe a common practice in training agents to play simple games like the Atari dataset is to use a technique called "sticky actions," where there's a small chance of the agent's inputs being dropped on each frame. That helps ensure that the agent is actually responding dynamically to what happens on the screen as opposed to just memorizing a long sequence of inputs.

Might not be a terrible idea to require any agent that attempts this challenge to have "sticky actions" or something similar enabled, to reject anything that's just a fancy TAS memorizer.

How does this interact with games like tetris or chessmaster? Full-game vs IL seems unclear?

predicts NO

@TorenqazzquimbyDarby probably none of these examples has "≥100 unique runners on speedrun.com or another leaderboard" -- I think these leaderboards focus on games that can be meaningfully played against time.

@Sjlver
https://www.speedrun.com/tetrisnes?h=100_Lines_Level_0_Start&x=n2y59gmk
>= 100 runners, >=3 minutes, and I'd imagine existing ai already beats human times.
Let me know if I'm missing something.

predicts NO

@TorenqazzquimbyDarby nice :-) I stand corrected.

@TorenqazzquimbyDarby Good question. I wasn't really planning on resolving this on anything that's basically a board game in video game form tbh, or otherwise as simple as Tetris. Something like the original Doom might be good enough (though Doom doesn't have 100 runners on speedrun.com).

Not really covered in my rules I guess, but I'll go by personal judgment here.