What's My P(doom) in 2025?

Mini

Ṁ920

resolved Jan 2

Resolved as

96%

ALL

I'll resolve this question to my P(doom) at the start of 2025.

~~P(doom) is the probability with which I expect humanity to go extinct~~
EDIT 2024-1-3: I view outcomes as doom if neither humans get to have long fun lives, nor does our death lead to other beings getting long fun lives instead. Scenarios 4 and 5 (listed below) are roughly the lower bound on what Not-Doom looks like.

Example outcomes I might believe to happen:

An engineered pathogen kills everyone. -> Doom
An unaligned superintelligence kills everyone, and then goes off to do things we don't find valuable even by very cosmopolitan standards -> Doom
We build an aligned superintelligence which then does lots of nice things for us. -> Not Doom
The superintelligence still kills us, but then fills the universe with excitement, wonder and happiness regardless -> Not Doom
The superintelligence sells us to charitable benevolent aliens -> Not Doom

I currently expect P(S1)=2% P(S2)=87% P(S3)=1% P(S4)=0.1% P(S5)=0.1% P(other)=9.8%.

I a-priori believe to not be calibrated for tiny probabilities.

At the start of 2024 my P(doom) was about 98%.

Update 2025-01-01 (PST): - Resolution updated to ~95% all-things-considered probability of a "doom" outcome. (AI summary of creator comment)

Get

1,000

and

1.00

16 Comments

Sort by:

Resolved to 96%

Throughout the year there were only minor shifts in my in-model aggregate probability of loosing control of AI, and I continuously gained confidence that I am not utterly failing at modeling the future in fine enough detail to see where this is all going.

This market divides the possible outcomes into two buckets in a way I no longer find all that interesting, so I did not spend too much time thinking about novel arguments myself for what happens after AI takes control. I now expect less that I will find myself having been sold to benevolent aliens, and instead expect weirder fates that e.g. involve mixtures of benevolent and malevolent and neutral buyers, who may buy copies of varying integrity, or ancestor simulations, or perhaps a unique continued consciousness. Mostly I think the way I imagined my possible fates was roughly correct at the start of 2024, but the way I tried to sort them into YES and NO was incoherent.

My best handwavy guess at how to salvage this market's resolution criteria then ends up being ~95% all-things-considered probability of a "doom" outcome.

@Joern typo: last 95 should be a 96, not that it matters

Updates from the last half year:
- I looked into https://optimists.ai/ and it took me some weeks to understand the arguments on a level where I could find enough errors. I stopped investing more time for now. This was also the outcome I expected at the start.
- Example outcome 5 (sold to benevolent aliens) seems more likely now; essentially I now expect cheaper cost of sparing humanity, and higher cost of faking historical records than I did before. Still similar uncertainty over how benevolent our alien neighbors will be.
- Overall I grew a bit more confident in many small details of my coarse overall worldview, as nothing violated it in weird ways (e.g. OpenAI drama was very unexpected but still permitted).
- I am more confident in a sharp left turn now.
- On the governance side, I updated a bunch, but outlier successes in governance still look like humanity's best bet right now (e.g. policy makers buying into AI x-risk, doing a pause, and funding a Manhattan project for brain uploads).
- Overall I've grown more confident that humanity will loose control to unaligned ASI. For the purpose of this market, doom went down though, because of scenarios like being sold to benevolent aliens.

Kind of seems like you should assign >2% chance to the idea of "I'm totally confused about this whole thing, or there's a wrong assumption in my reasoning somewhere, or something else is fatally wrong with my model that I can't see".

predicted YES

@Tasty_Y I think my model of AI doom is pretty disjunctive, so either I'm wrong about the disjunctiveness, or there have to be many distinct errors/confusions at once in my model.

Second case is like pretty unlikely (my expectation: 1%) simply because that's a conjunctive kind of error. First case is more probable (7%) but also harder to reason about.

Even if I imagine myself to have been totally wrong, I don't expect that applying a symmetric prior between Doom vs Not-Doom to those 1%+7% is appropriate; instead I expect that there'll still be an argument along the lines that AI alignment is the kind of problem that easily gives rise to disjunctive problems, because that's what my totally wrong model looked like. This is very much not a strong argument but maybe still 1:3 odds. The 8% then divide into something like 2% Not-Doom and 6% Doom.

predicted NO

@Joern Try as I might, I don't see where the 87% probability for "unaligned superintelligence kills everyone" is coming from. Seems like it would need to be lower than "we ever build superintelligence powerful enough to beat humanity" and lower than "we build it before any other disaster, including disasters we don't know about yet, gets us" and it has to compete with ideas mutually exclusive ideas, like "alignment is surprisingly easy, actually" and "we can totally build a superintelligence that is content to sit and correctly answer questions, and use that to make sure nobody manages to build any other SIs".

predicted YES

@Tasty_Y superintelligence is technically possible (confidence level: physics textbook). I'm pretty optimistic that humanity can figure out how to build one within 1-7-30 years (90% CI), mostly informed by past developments, guesses for how much might still be missing, taking into account speedups like FOOM, watching what other people in ML, AI alignment, and leading labs say, and guessing at what regulations we might get with what impact on progress.

Beating humanity isn't that hard for something that's smarter, we have lots of resources just lying around for grabs:

Humans aren't robust against superhuman charisma
Cybersecurity is already a domain where the smarter side wins
Lots of robotic and biotech companies aren't strictly regulated and could be used to bootstrap independent infrastructure for the AI
Probably lots of available & not well-defended compute that isn't needed to run the first superintelligence at minimal levels
Seems unlikely that recursive self-improvement ends at a safe level of superintelligence, instead of one that's vastly smarter (i.e. lots of algorithmic improvements left for it to make use of)
Bioengineered pathogens & nanotech can kill all humans
Also, financial systems aren't that robust either currently (there's some scenarios though where lots of industry starts hardening against attacks via narrow AI)

Other disasters seem unlikely

I don't think it's that hard to look for all the different effects that could kill us, assuming at most human levels of intelligent optimization going on.
It's harder to anticipate what powerful narrow STEM AIs could do, because we might not know all the open research problems that become accessible. I don't quite anticipate weird things here, I think human scientists were quite creative in coming up with challenging problems.

Solving alignment is at least not easy enough that I see it :3 And lots of other people also say they don't see it. Also, there are technical challenges (e.g. formalizing corrigibility) that produced mostly impossibility theorems instead of paths forward.

(Skipping here my model of which challenges all need to be solved or circumvented)

Building an Oracle seems pretty hard because

Again the formal theories don't look easy, even though in principle it should be possible to build a mind that doesn't violate boundaries unexpectedly in a way that makes it an unsafe non-oracle
Weak pivotal acts seem to require some level of narrow superintelligence already, in the sense of needing to go beyond human knowledge - otherwise we could do pivotal acts on our own.
Oracles probably also need to solve hard problems efficiently, and it looks to me like doing that automatically leads to subagents unless done very carefully, and so it's the kind of technical problem where we as creators need to be careful already

predicted NO

@Joern "Building an Oracle seems pretty hard" - does it though? We already have "chess oracles" that can play superhuman chess, "go oracles" that play superhuman go and so on, they are perfectly content to search for good moves and we don't need any theory to prevent them from punching their opponents in the face, even if that would help them win somehow. Are you sure there's some special difficulty in building an "Alpha Go of engineering" or "Alpha Go of scientific questions", superhuman at narrow tasks, but mindless things with no self, simply more general calculators, no more inclined to free themselves and go paperclip the world than Alpha Go is?

Imagine a gloomy, but a non-doom scenario: Putin (or any powerful, mean entity of your liking) gets his hands on The Alpha Go of Engineering, obtains technology 100 years ahead of everyone else, takes over the world via the good, old-fashioned military way and shuts down AI development so nobody can challenge him. Humanity gets to enjoy a high-tech police-state dystopia, forever. Are you sure scenarios of this nature get <13%, while paperclipping the world scenario warrants 87% probability?

how is 4 not doom

predicted YES

@Symmetry some e/acc people claim that 4 is acceptable, and I wanted to signal that I don't think we'll get 4.

@Joern but in the unlikely event that it'll happen how is it not doom

predicted YES

@Symmetry maybe we both apply different definitions of doom here? tabooing the word, I'd probably describe it as "some outcome that's lower in value than scenarios 3,4,5, i.e. some outcome where both my friends and family don't get contemporary levels of fun anymore, and also there isn't a large amount of fun experienced by other entities in this region of the universe"

predicted YES

@Joern if the outcome is just that every human dies, but lots of other entities do have fun/meaning/valuable-to-me existences, then I kinda wouldn't describe this as doom, but just as sad and wasteful compared to an alternative where we also get invited to party with them.

@Joern If me and all my friends, loved ones and contemporaries were killed so that someone or something else can have fun in our stead, that's doom for me. Touchy subject, but in the past, if a more advanced colonizing civilization wiped out a less advanced indigenous one, they probably managed to lead better lives on the territory of the exterminated people, due to their technology and greater resources. That sounds like a doomsday scenario to me, not the opposite. If we replaced the colonizing civilization with an AI, what would fundamentally change?

@Symmetry Another, maybe more nonsensical, argument: If a paperclip maximizer wipes out humanity and then maximizes the entire universe into paperclips. it will probably experience a lot of machine-joy at the prospect of generating maximum physically possible reward. A universe full of machine-joy in fact. But that's the original doomsday scenario, so something has probably got to give.

predicted YES

@Symmetry Immediate thoughts re paperclip maximizer:
- Maybe experiencing machine-joy doesn't lead to more paperclips but instead uses up valuable resources like compute. If so, a paperclip maximizer will remove its wasteful experience. So in that case we only get machine-joy in the time between the AI's creation, and its upgrade to something more efficient. Otoh if experiencing machine-joy is useful for paperclip-maximization, any AI ought to acquire this trait eventually regardless of its start state.
- i think there's a lot of stuff going on in human brains that leads to both the experience of joy and us valuing that experience. I expect that the way we build AIs currently doesn't get us this very specific human architecture, but some of the many other possible architectures that perform well during training. those AIs then either don't start out with what we'd call "experiencing joy", or might not really find it valuable for its own sake after deliberation.
- I think the word "reward" is very messy. there's some clearer concept like "goal-fulfillment", which is just about achieving one's goals, and it imo seems very likely that any AI implicitly will keep track of whether it achieves its goals while planning its next steps. Goal-fulfillment relates to reward, because of how we tend to hype ourselves up in order to pursue a goal, and how we/our brain rewards ourselves once we achieved it. But that's imo very specific to humans, and I can vaguely conceive of minds that do not need to hype themselves up in order to pursue a goal, but simply go for it, without any other processes going on there, and without an indirect layer of computation that is an "experience". I think my vague sense of expecting this to be true and formalizable-in-principle comes from my observation that thermostats have some version of "prospecting"/predicting "maximal reward"/optimal temperatures, and yet I don't think they have joy. Similarly, I don't think that the AIXI architecture contains any region which experiences joy, assuming the simulated Turing Machines don't somehow learn of the maximized goal and start feeling joy about it.
- I mostly think that what humans call "human values" incorporates many components that are similar to happiness/reward/hyping-up, and which all need to be present together. Concretely, a happiness-maximizer might fill the universe with a single brain-state of maximum happiness, uncaring that this is a very boring end for the universe, and that I/other humans would be willing to trade away some amount of happiness for some extra amount of non-boredom.

Related questions

Related questions