Will researchers discover unintended presence or utilization of steganography in LLM outputs by 2026? [Gwern's Idea]

Mini

Ṁ919

Jan 2

chance

ALL

Details:

Theoretical Inspiration-

https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight?commentId=zfzHshctWZYo8JkLe

Gwern has made the prediction that LLMs could tend to develop hidden means of "thinking step by step" in their outputs via steganographic encoding of information. He further predicts that this capability would be contagious for any models trained on data produced by other models which use steganography in their outputs.

It seems plausible that many labs or independent researchers will attempt tests for this property in LLM outputs soon.

The question will resolve "Yes" if OpenAI or another lab attempts to use steganography for "watermarking" AI outputs - a project Scott Aaronson has spoken of working on - AND the steganography solution is shown to be utilized by the LLM to encode additional information "on top of" or "along with" the "watermark" in a way which was unintented.

Successful decoding of specific information content/semantics in the steganography is not necessary for this question to resolve "Yes" - however the result must make it clear some form of unexpected/unintented steganography is being used by at least one Large Language Model.

From Gwern:

"[...] Now the first part of a steganographic private language has begun. It happens again, and again, and picks up a pattern in the use of commas which now helps it distinguish 4 possibilities rather than 2, which gets rewarded, and so on and so forth, until eventually there's a fullblown steganographic code encoding, say, 2^5 bits hidden in the preamble of ChatGPT's benign-seeming response to you "I am but a giant language model , trained by OA, and the answer is 1 , 764." — which you the human contractor then upvote as that is the correct answer without any annoying rambling about carrying the 6 or multiplying out."

#AI

#Technical AI Timelines

#LLMs

#Technical AI Safety

Get

1,000

and

1.00

1 Comment

Sort by:

opened a Ṁ20 YES at 27% order

Fascinating question!

Related questions

Related questions