GPT next-token-prediction models led to the o-series reasoning models.
When will the next paradigm be released?
This is a fuzzy question, but it needs to be a significant advance, and is different in some way. For example, smarter models or larger context windows would not count. Multi-modal mostly doesn't count (e.g. it generates video or accepts audio).
You would need a breakthrough on the order of a near-infinite context window or an autonomous agent API.
"Multi-modal mostly doesn't count"
I assume this means sufficient multi-modality could count? So, e.g. if a breakthrough allows all text/image/audio data training to improve across modalities (e.g. feed it more YTube videos and it's next-token prediction improves, though likely not as efficiently as training on more text data), would that be sufficient? Or if not, what would?