Will there be a well accepted formal definition of value alignment for AI by 2030?

Plus

Ṁ709

2030

25%

chance

ALL

Well-accepted: if there's a definition accepted by even 25% of the research community I'll resolve yes. If there are multiple similar-but-competing definitions covering 50% of the community I'll resolve yes.

Oct 3, 9:12pm: By "formal definition of value alignment" I mean there is a particular mathematical property we can write out such that we're reasonably confident that an AI with that property would in fact be value aligned in the colloquial sense.

#Technical AI Timelines

#️ AI Alignment

#Technical AI Safety

Get

1,000

and

1.00

4 Comments

Sort by:

I think there’s >1/3 chance that if we’re alive in 2030, there’s such a definition (e.g., some maybe descendant of PreDCA)

Will there be a well accepted formal definition of value alignment for people by 2030?

Will there be a well accepted formal definition of value alignment for companies by 2030?

Will there be a well accepted formal definition of value alignment for nations by 2030?

We have our answer.

So far we have:

add “black person” to 7% of prompts
ban reference to Ukrainian cities
refuse to release weights, to better profit from selling API queries
only allow misspelled references to public figures

I’d say it’s going great, definitely not a bunch of barnacles attaching themselves to 100,0000 ton ship that is AI progress

(This is just a more advanced version of the “what if the car has to decide between swerving to hit 8 grandmas or one stroller” grift.

None of these scenarios or philosophies will matter.

AI will be so powerful a single actor can cause immense destruction — whether from weapons design, propaganda/psy-ops, or the like — long before it “accidentally” violates some ham-fisted “moral principles” encoded in some supposedly “safe” system.

There are no agreed on moral codes for anything else in life—the people who claim to do “AI ethics” are rarely people you’d trust to manage a small team.)

Related questions

Related questions