Will reinforcement learning overtake LMs on math before 2028?
➕
Plus
37
Ṁ3889
2028
63%
chance

Will a state of the art model on Hendrycks' MATH be trained for more FLOP on RL than it is on LM objectives? A purely RL model counts as well of course.

RL encompasses anything involving online learning or expert iteration-like etc. If this ends up being difficult to call because of some breakthrough in decision transformer style conditional imitation learning (ie something between rl and LMs), I will probably cancel the market as ambiguous.

When models approach 100% acc on MATH, a similar successor natural language math dataset will be used instead.

Get
Ṁ1,000
and
S1.00
Sort by:

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

I'd guess this took something like 1-10 trillion tokens worth of FLOPs.

predicts NO

Would you count LM regularization Terms computed during RL phase as part of the LM share? This may actually be hard to disentangle?

predicts YES

@Thomas42 That’s a bit tricky, but I’d say kl penalties from base LM should just be counted as part of the RL compute. That’s not an LM loss anyway.

If this question ends up hinging on some edge case like a method which does continued LM training during RL, and the relative compute contributions are unclear Ill probably resolve N/A.

I think the first question one should ask is will anything overtake LMs. The probability that one specific technology would be the one doing the overtaking should then be below that base probability. I place the first probability at around 50%, so I am comfortable betting against this at the current price.

Why are you defining RL as online learning? Online learning encompasses more than RL. Why not define it using action/state/reward?

predicts YES

@vluzko I wanted to exclude decision transformer type stuff. Maybe it would be more fair to have titled the question 'Will online learning overtake offline learning for LMs on Math...', but I went for something more eye-catching.

I'm interested in this because I'm interested in the data shortage in terms of imitation learning data available. I also think offline learning has different safety properties.

predicts YES

Would be curious to hear why everyone's NO on this. 2028 is 5 years out, and Epoch AI estimates 4x/yr compute scaling, with text data running out by EOY 2024. That gives 3 years worth of compute scaling that needs to go somewhere else.