Will Jacob Pfau think that the Eliciting Latent Knowledge research program has achieved something important by October 20, 2026?

Plus

Ṁ2357

2026

31%

chance

ALL

(Inspired by and partially copied from tailcalled's question series)

Eliciting latent knowledge (ELK) is a research direction described by Paul Christiano, Ajeya Cotra, and Mark Xu. "[a] prediction model "knows" facts (like "the camera was tampered with") that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events?" This is particularly problematic when a model's ontology differs from human ontology in which case a reporter may learn to merely report what a human would think c.f. this section. Solving ELK seems likely to be helpful for other problems in AI safety while also being a necessary condition for other AI safety proposals c.f. this section.

In 4 years, I will evaluate ELK and decide whether there have been any important good results since today. I will probably ask some alignment researchers working on this area such as Paul Christiano or Jacob Hilton for advice about the assessment, unless it is dead-obvious. If in 4 years it seems to me that solving ELK is much less important to AI safety than I thought in 2022, I will resolve this question as NO.

About me: I am a PhD student working on AI safety and NLP at the NYU alignment resarch group. I am currently working on creating a benchmark quantifying the hardness of ELK for language modelling and code generation. Proposed solutions to ELK performing better on my (possible-future-)benchmark than current RLHF/CoT methods would very likely be insufficient for me to resolve this market YES. Solutions to ELK have already been proposed; if a refinement of one of these solutions is demonstrated to suffice for solving ELK, I would very likely resolve this question YES.

#AI

#AI Safety

#️ AI Alignment

Get

1,000

and

1.00

2 Comments

Sort by:

Related: https://manifold.markets/LeoGao/will-the-worstcase-elk-problem-be-s

Will Jacob Pfau think that the Eliciting Latent Knowledge research program has achieved something important by October 20, 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

Related questions

Related questions