Will tailcalled think that the Infrabayesianism alignment research program has achieved something important by October 20th, 2026?
➕
Plus
20
Ṁ1911
2026
34%
chance

The Infrabayesianism research program by Vanessa Kosoy and Diffractor is based on combining utility maximization and minimax at deep levels of mathematics. A hope is that it may help deconfuse notions of agency, in particular creating strong foundations for learning theory with provable regret bounds, and solve issues related to embedded agency.

In 4 years, I will evaluate Infrabayesianism and decide whether there have been any important good results since today. I will probably ask some of the alignment researchers I most respect (such as John Wentworth or Steven Byrnes) for advice about the assessment, unless it is dead-obvious.

About me: I have been following AI and alignment research on and off for years, and have a somewhat reasonable mathematical background to evaluate it. I tend to have an informal idea of the viability of various alignment proposals, though it's quite possible that idea might be wrong.

At the time of making the prediction market, my impression is that Infrabayesianism is lost in math. I was excited about it when it first came out as it sounded like it could solve a bunch of fundamental problems, but I had trouble working through all the math. I've been chewing on it on and off for a while, and have gained more understanding, but I still don't fully get it yet. I have become less optimistic over time, as I've realized that e.g. the update rule depends on your utility function. It seems insufficiently compositional to me.

More on Infrabayesianism:

https://www.lesswrong.com/posts/zB4f7QqKhBHa5b37a/introduction-to-the-infra-bayesianism-sequence

Get
Ṁ1,000
and
S1.00
Sort by:

Could you list some past AI alignment failures/successes, in your view?

@vluzko Alex Turner's power-seeking theorems seem worthwhile to me: https://www.lesswrong.com/s/fSMbebQyR4wheRrvk

As does the finding that model-based AI robustly avoids wireheading (I lost the link again, I should probably write up my view on it to further promote the viewpoint, idk).

I guess "factorization approaches" i.e. HCH, debate, etc, would be an example of a dead end/failure.

Will tailcalled think that the Infrabayesianism alignment research program has achieved something important by October 20th, 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition