Will we reach "weak AGI" by the end of 2025?
➕
Plus
245
Ṁ35k
2025
30%
chance

At the time of market creation, the community prediction on Metaculus for the "Date Weakly General AI is Publicly Known" is Oct 18, 2025: https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/

The market will resolve YES if, in retrospect by the end of 2025, this has indeed been achieved (and is agreed upon by industry experts); the market will resolve NO if not.

Get
Ṁ1,000
and
S1.00
Sort by:

I'd be interested in betting on something like this, but I can't accept the resolution: is agreed upon by industry experts. "industry experts" are biased. You should instead define what you mean by AGI in a way that is scientifically testable and resolve it by if any can pass the test.

A weak test (but good example) would be the Turing test. A given AI will or will not pass the Turing test, without regards to what industry experts say.

@BrandonNorman "Turing test" is not very well defined and some versions were beaten by chatgpt already... on multiple occasions, depending on what you consider a "Turing test".

https://humsci.stanford.edu/feature/study-finds-chatgpts-latest-bot-behaves-humans-only-better

https://www.nature.com/articles/d41586-023-02361-7

@ProjectVictory I don't disagree, but a specific version of the Turing test could be well-defined for the purpose of this resolution, or we could use some other test. I just don't like the idea of "industry experts". They will be biased to make claims beyond the capability of the product they are selling.

It's like asking the car salesmen how good the car is.

@block blast While it’s true that many LLMs may contain SAT Math questions in their training datasets, the focus of their training is on understanding patterns and generating coherent responses rather than strictly adhering to any particular educational standard. Excluding certain types of questions for the sake of compliance with Metaculus criteria may not be a priority for developers, especially if those questions contribute to the model's overall performance and versatility in math-related tasks.

this is a very vague resolution criteria

I bet all current LLMs include SAT Math questions somewhere in the training data? And nobody will bother to exclude those questions from their training run just to satisfy some Metaculus criteria

@ahalekelly The author clarified below that they are going to rely on expert consensus not Metaculus strict criteria

If you're 100% sure that we should have, "weak AGI" by 2025, then you should be able to have a clear answer for how many watts that will take and should have no problem betting on this market. If you can't answer this question, then you are basically just guessing and have no certainty that it will occur by 2025 or if at all.

@PatrickDelaney had anyone claimed 100% confidence? The range seems to be 25% to 75%.

@MartinRandall Sorry, I'm unclear what you mean. "X% confidence," sounds like, "X confidence level," a statistical term. Did you mean confidence level? Whereas, "100% sure," is vernacular which could mean, "I as an individual am filling an order at 100% on YES at this time," or, "I am very sure of this particular belief, and hold no numerical measurement of this belief in any dimension," e.g. I just believe it.

@PatrickDelaney It'd be very strange to quantify 100% on a forecasting site and not mean a quantity.

Don't we already have it? Just give ChatGPT access to a real-life robot's API and have it self-reflect.

@xxx As someone who works in NLP and lives with a roboticist, this doesn't work for many reasons. Chief among them is that the network doesn't know what does and doesn't work in a robot, so it will have no understanding of why something failed, and therefore gains from reflection will be minimal. There's also a broader issue of using language models as controllers for continuous high dimensional tasks, where even very slight imprecision leads to wildly incorrect answers. This is in contrast to something like standard language tasks where there are many potential correct answers with a lot of fuzziness around each one.

predicts NO

What does "in retrospect" mean?

Is this guaranteed to resolve at a particular date?

predicts NO

@NoaNabeshima yes, i will resolve this market when it closes on Jan 1, 2026. looking back, if i feel like the resolution criterias are met, i will resolve the market to YES, otherwise i will resolve the market to NO.

@VictorLi Suppose that noone actually tries the Loebner silver turing test and there is some disagreement between industry experts about if it would be passed if tried, but you think it would be passed. Could this resolve yes?

Suppose some industry experts think it would be passed and you think it would be passed, but others aren't sure. Could this resolve yes?

predicts NO

@VictorLi Could this resolve yes if industry experts think it would be passed but it's not been attempted?

@NoaNabeshima if i believe it qualifies as "weak AGI" and a majority of industry experts concur, then i will resolve YES, otherwise it will resolve NO.

granted, "the majority of industry experts" is a vague measure, but i think i will abide with common sense on whether or not there is consensus. fwiw i basically expect it to be weak AGI, but im betting NO because i doubt the industry experts wil recognise it as such.

predicts NO

@VictorLi So if it definitely doesn't solve Montezuma's revenge but expert consensus is that it's "weak agi" does this resolve No or Yes?

predicts NO

All tasks have been met except Turing and unification (the former is beyond trivial, the latter is somewhat subjective)

@Gigacasting Montezuma's revenge is not solved to this degree

i basically expect GPT-5 to be "weak AGI", and there is an argument to be made for GPT-4 to already qualify. my main uncertainty is regarding whether or not the experts will come to a consensus since i expect a lot of them to keep moving the goalpost.

Stitch DreamerV3 onto gpt-4 and it’s done