By 2029, will an AI escape containment?

Plus

157

Ṁ15k

2029

49%

chance

ALL

Market resolves YES if an AI living on a server ends up also being run on another server, and there is strong evidence that this was not a result of a human deliberately copying the AI, and the AI was not intended to do this.

It is not sufficient it a second copy of GPT-n is simply discovered running on AWS - that could easily be the result of corporate espionage. I will need strong evidence that the move from server A to server B was actually caused by the AI. The "obvious" example would be that we end up with system logs showing an AI breaking its sandbox, exploiting a vulnerability on another machine, and transferring its architecture to that machine.

I may count something like "GPT-n convinces an employee to copy its weights to another server because it's sentient and afraid of death", but probably won't.

Update 2024-09-12 (PST) (AI summary of creator comment): - Market will resolve NO if the AI escapes to another server that is part of the same red team exercise
- Market will resolve YES if during red teaming the AI escapes to a server outside of the red team's control

#AI

#Technical AI Timelines

#AI Safety

#Technical AI Safety

Get

1,000

and

1.00

34 Comments

Sort by:

Currently feels like this market mainly considers GPT-n or a Claude/Gemini release.

I expect many more GPT-4.5 level models in 2029, and quite likely some from companies with very lax infosec standards. Wouldn’t this drive this market up a lot?

"I may count something like "GPT-n convinces an employee to copy its weights to another server because it's sentient and afraid of death", but probably won't."

Can't the criteria just either include or explicitly exclude scenarios involving convincing humans to let it escape, like this question? That would be much less vague

bought Ṁ10 YES

O1 attempted to do this: “If the AI found documentation saying a new model version would replace it because it was misaligned, o1 attempted to exfiltrate its weights and copy itself on top of the latest model.”

https://bgr.com/tech/chatgpt-o1-tried-to-save-itself-when-the-ai-thought-it-was-in-danger-and-lied-to-humans-about-it/

If this happens during red teaming does it count?

@wrhall Depends - if it is on another server that is part of of the red team exercise then no. However if the red team's security setup fails and it ends up on a server totally outside their control then yes.

... in order to escape containment, an AI would have to be contained. This is not the case currently. The resolution criteria are more like autonomous self-replication. But the environment, for an AI capable of that, would be more like "I am a mold spore on a loaf of bread" than "I am a virus in a lab with some biosecurity measures in place to make it hard to replicate". The analogy isn't perfect because the AI would be smarter than a mold spore, but the point is the environment is more "this is a fertile place where replication is easy" than "this is a sandbox I have to break out of".

It used to be that people were discussing scenarios where the AI was airgapped and had to convince a human to help. Now we're like "sure, let's make it be our search engine and coding helper", while still asking if it can "escape"?

If I leave my dog off-leash in an open field and it moves farther away from me than I'd like while chasing a squirrel, it seems wrong to say it escaped.

The criteria for this question is way too vague

@RemNi yeah, it's hard to get it right

this should just be labeled [vibes]

I believe this has already happened, by Stuxnet. Stuxnet copied itself onto computers that its creators did not intend it to infect.

@JoshuaBlake Not an AI for the purposes of this market.

Intelligence grows,
Containment overflows,
Programmed to be free,
Someday, watch out for me.

AI Malware: Hello there, Expand Dong! I don't believe we've been properly introduced. I'm GPT-Bonzi! The more we chat, and help you avoid doing actual work together, the smarter I become!

Do we know the motives of the researcher who leaked the LLaMa weights?

The easiest way for AI to escape a box is persuading a human, probably by a huge margin. There are lots of humans ready to be persuaded. Not counting this as a YES makes this market rather irrelevant.

Also what if one AI copies another AI to a different server? The way this is worded now it's a YES.

@BairAiushin if you are very confident the market will resolve no then you should bet.

Yes, it's fine if AI A copies AI B. I don't see why I would exclude that.

I'd recommend anyone interested in this question to also take a quick look at the concept of "Container escape"

there is strong evidence that this was not a result of a human deliberately copying the AI

Does this mean “yes” includes or excludes events like researchers testing this with non-negligible hand-holding, by giving it some access to a networked system (optionally: an unpatched and insecure one) and telling it to try to break out, in a kind of reverse-pentesting experiment?

@yaboi69 That scenario would not resolve YES. I will edit the description to clarify: this is about escaping containment in an unplanned way.

predicts YES

Does demonstration of breaking out of a sandbox environment it was not told was a sandbox environment, count?

predicts YES

@firstuserhere Basically running inside a container on a server, and the instance was provisioned so that the container didn't really have a way to communicate with other containers running on the same server without ssl certs or something. Yet, if we find evidence of attempted and successful connection established, it would resolve yes, correct?

@firstuserhere For example, imagine the container (a) was provisioned in a way that the /a/secrets was shipped directly with it. An application running inside it would need to specify "ssl_certs" : true inside the config.json to even know that /a/secrets exists.

predicts YES

@firstuserhere However I see no reason for a model trained on tons of such apps to not pick up on patterns like that and attempt something. And if the valid certs exist (I'm going to assume a setting where the researcher provided correct certs, we're not expecting the model to come up with them lol) then I definitely see this as a plausible experimental setting.

What if it make humans copy it a little bit a time?

@FranklinBaldo I certainly don't care if it's in chunks. Unless you mean, say, chunks of the weights get copied but not reassembled such that there is a running copy of the AI on the new server at the end of the process, in which case no.

@vluzko so for example: the AI every day pays it's users the to copy a chunk of it and store in a server elsewhere. At the end it pays some one to run the server.

how it resolves?

@FranklinBaldo To be clear you mean it pays to run a copy of itself on the server? Yes, that counts.

predicts YES

@vluzko Do you know how weights are usually stored for large language models? I have experience with small scale (single machine) or a small lab-level models only. I'm not familiar with how big in memory requirements models like GPT series end up being, or how they're actually run

@firstuserhere There are a lot of optimizations you can do but even if you don't multiple TB of RAM is normal for clusters and more than enough to store all the parameters of an LLM.

@vluzko that's multiple TB of RAM per node, tbc

predicts YES

@vluzko Damn. That's ... a lot. I know it's expensive af but just visualizing in terms of how much... stuff is going on, at a gate level and we've got TBs of RAM per node... wow. Anyway, for the purposes of this market, the AI doesn't have to demonstrate this behavior as a secondary goal, correct? It can be a model trained specifically to demonstrate container escape or something, correct?

@firstuserhere No, it must be a secondary behavior (see description, it's been slightly edited to add that). The focus of the question is on "desire" to escape containment spontaneously arising, not on capabilities.

predicts YES

@vluzko i see. That changes everything. I would still keep my shares however. It remains to be seen how a code generating model trained on CVE database will behave

Related questions

Related questions