In 2025, what 2019-2022 work of AI safety will I think was most significant? | Manifold

In 2025, what 2019-2022 work of AI safety will I think was most significant?

Mini

11

Ṁ328

Jan 1

1D

1W

1M

ALL

18%

Other

15%

Eliciting Latent Knowledge https://www.lesswrong.com/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge

14%

Discovering Agents https://www.alignmentforum.org/posts/XxX2CAoFskuQNkBDy/discovering-agents

13%

Induction heads https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html

8%

Causal Scrubbing https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing

6%

Risks from Learned Optimization in Advanced Machine Learning Systems https://arxiv.org/abs/1906.01820

6%

Constitutional AI: Harmlessness from AI Feedback https://www.anthropic.com/constitutional.pdf

5%

Mechanistic Anomaly Detection https://www.alignmentforum.org/posts/vwt3wKXWaCvqZyF74/mechanistic-anomaly-detection-and-elk

5%

Infra-Bayesian Physicalism https://www.lesswrong.com/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized

3%

The Sharp Left Turn https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD/p/GNhMPAWcfBCASy8e6

2%

2022 MIRI Alignment Discussion https://www.alignmentforum.org/s/v55BhXbpJuaExkpcD

1.8%

Other Not Listed Here

1.3%

Optimal Policies Tend to Seek Power https://arxiv.org/abs/1912.01683

Works to be considered include Arxiv papers first appearing in this time window, Lesswrong posts, and paper-like posts (mainly to include Anthropic papers). This time window includes both 2019 and 2022. 'Significant' here means was contributed the most to progress towards AI alignment and AI safety. This is obviously very subjective.

If I were to answer this question for papers 2016-2019, possible answers would have included, among others, 'AI safety via debate', 'The off switch game'.

#Effective Altruism

#Change My Mind

Get

1,000

and

1.00

Sort by:

Relevant:

None of those.

@Lauro Is this intended to include the heuristic arguments work?

@JacobPfau For the purposes of this question I'll include the associated Arxiv paper under the "Mechanistic Anomaly Detection" option.

@JacobPfau ah yeah I agree it makes sense to include heuristic arguments here!

Related questions

Where will the next major breakthrough in AI originate from before 2025?

Will I (co)write an AI safety research paper by the end of 2024?

I make a contribution to AI safety that is endorsed by at least one high profile AI alignment researcher by the end of 2026

Will there be serious AI safety drama at Meta AI before 2026?

Will there be serious AI safety drama at Google or Deepmind before 2026?

Will I still consider improving AI X-Safety my top priority on EOY 2024?

Which Will Be The Most Impactful New AI Idea in February 2024?

Which AI will be the best at the end of 2025?

In 2050, will the general consensus among experts be that the concern over AI risk in the 2020s was justified?

Will there be a coherent AI safety movement with leaders and an agenda in May 2029?

Related questions

Where will the next major breakthrough in AI originate from before 2025?

Will I still consider improving AI X-Safety my top priority on EOY 2024?

Will I (co)write an AI safety research paper by the end of 2024?

Which Will Be The Most Impactful New AI Idea in February 2024?

I make a contribution to AI safety that is endorsed by at least one high profile AI alignment researcher by the end of 2026

Which AI will be the best at the end of 2025?

Will there be serious AI safety drama at Meta AI before 2026?

In 2050, will the general consensus among experts be that the concern over AI risk in the 2020s was justified?

Will there be serious AI safety drama at Google or Deepmind before 2026?

Will there be a coherent AI safety movement with leaders and an agenda in May 2029?

Terms & Conditions•Privacy Policy•Sweepstakes Rules