By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?
➕
Plus
16
Ṁ2710
2026
81%
chance

Atari environment: standard ALE (https://paperswithcode.com/dataset/arcade-learning-environment)

"Superhuman performance": I'm using the common industry definition, meaning the agent eventually achieves performance better than a particular human baseline (the human normalized score). Please note that this is not the best score achieved by any human. Yeah this is confusing, but it is common terminology and I'm sticking to it.

Learned causal model:

  • The agent has a model of the environment

  • The model is a causal model in the sense that there is an explicit causal diagram associated with it

  • The causal diagrams are not complete bipartite graphs

  • The causal diagrams are at least sometimes close to minimal (<=150% of the minimum number of arrows required)

  • The diagrams are not hardcoded or otherwise substantially provided to the model

  • The environments are not annotated with additional causal information. Just the standard ALE outputs.

Additional context: this is/was the focus of my research. I think doing it without access to ground truth causal diagrams is hard.

Get
Ṁ1,000
and
S1.00
Sort by:

This doesn't resolve the question, but the recent ([2303.07109] Transformer-based World Models Are Happy With 100k Interactions (arxiv.org)) has superhuman performance on 8/10 environments by building a world model using transformers with causal masking. It's limited to 100k interactions, so presumably with more training and interactions it could get 10, and somebody might try to fit a causal diagram to the learned world model. They use Atari 100k, but I'm assuming its similar enough and if anything more stringent than the environment you linked. So my uneducated(for causal models) guess is this is more likely than it was before.

Let me know if I'm missing the point of your question and this advance is irrelevant.

What if the causal model is not meaningfully used by the action model? Like, which if there's a casual modeler duct taped to a non casual RL agent?