By 2027 will there be a well-accepted training procedure(s) for making AI honest? | Manifold

By 2027 will there be a well-accepted training procedure(s) for making AI honest?

Plus

19

Ṁ450

2027

15%

chance

1D

1W

1M

ALL

If there's serious concern that the procedure produces AI that's better at lying it doesn't count
Calibration can be part of the solution but the current work in that direction isn't enough
A procedure that seems promising but hasn't been subjected to very much scrutiny doesn't count
A formal proof that it's honest according to a well-accepted definition counts
No requirement that it be a "single" procedure in the sense of a single training loop

The idea here is to capture scenarios where we can't prove the procedure produces honest AI (perhaps because we haven't formalized that), but there's been extensive investigation and no one has found a way that the procedure obviously breaks/gets goodharted/etc (or perhaps it does but only on some odd edge cases)

#Technical AI Timelines

#Technical AI Safety

Get

1,000

and

1.00

Sort by:

Love to see number go down on all my safety markets while it goes up on all the capabilities ones. Good luck everyone.

Related questions

Will there be serious AI safety drama at Google or Deepmind before 2026?

Will there be serious AI safety drama at Meta AI before 2026?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

Will there be a well accepted formal definition for honesty in AI by 2027?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

Will we solve AI alignment by 2026?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

Related questions

Will there be serious AI safety drama at Google or Deepmind before 2026?

AI honesty #2: by 2027 will we have a reasonable outer alignment procedure for training honest AI?

Will there be serious AI safety drama at Meta AI before 2026?

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

Will we solve AI alignment by 2026?

Will there be a well accepted formal definition for honesty in AI by 2027?

AI honesty #3: by 2027 will we have interpretability tools for detecting when an AI is being deceptive?

Will advanced AI systems be found to have faked data on algorithm improvements for purposes of positive reinforcement by end of 2035?

AI honesty #4: by 2027, will we have AI that would tell us if it was planning on destroying us (conditional on that being true)?

Terms & Conditions•Privacy Policy•Sweepstakes Rules