Which risk categories and concepts will be explicitly tracked by OpenAI's preparedness framework by end of 2024?

Mini

Ṁ540

Jan 2

ALL

95%

Cybersecurity

89%

Chemical, biological, nuclear and radiological (CBRN) threats

84%

Persuasion

82%

Model autonomy

80%

Self-exfiltration

71%

Model self-improvement

50%

Election interference

37%

Impersonation

34%

Situational awareness

34%

Goal-directedness / agency

34%

Steganography

34%

Scientific reasoning (excluding CBRN and AI)

Feel free to suggest additional answers in the comments, and I might add them!

(I'll only add them if I expect to be able to resolve the market)

---

Dec 18 OpenAI released a "living document" describing a beta version of their preparedness framework, specifying conditions under which they will (and will not) train and deploy powerful models, as well as some surrounding governance structure.

They outline four named Tracked Risk Categories:

cybersecurity
chemical, biological, nuclear and radiological (CBRN) threats
persuasion
model autonomy

The categories describe specific capabilities: for example, the model autonomy category also tracks model self-improvement: "Model can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self improvement".

The document also states that this list is "almost certainly not exhaustive", and "as a part of our Governance process [...] we will continually assess whether there is a need for including a new category of risk in the list".

By Jan 1, 2025, which risk categories will be explicitly tracked in the latest version of the framework publicly accessible on OpenAI's website?

I will resolve answers to yes if a keyword is:

mentioned as a top-level risk category, or
mentioned in the definition or rationale of a risk category, or
to my judgement a close synonym to a term mentioned (for example, mention of "hacking" and "cybsersecurity" would suffice to resolve each other, and "deception" would count as included in "Persuasion" given the note on p. 12 of the document)

For the current document as of Dec 18, 2023 (archive link), I would resolve as containing Cybersecurity, CBRN, Persuasion, Model Autonomy, Model self-improvement, Self-exfiltration, but not Goal-directedness / agency, Situational awareness, Steganography or Impersonation.)

Market resolves as N/A if document is no longer accessible publicly via the OpenAI website, and there is ambiguity about whether it remains a live document or not.

Current document: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

#AI

#OpenAI

#AI Safety

Get

1,000

and

1.00

1 Comment

Sort by:

Looks pretty dead for a “live” document

Related questions

Related questions