Will a new deep learning paradigm replace the transformer by the end of 2024?
➕
Plus
20
Ṁ1395
Jan 1
8%
chance

Will a new neural architecture, or entirely different machine learning method, replace the dominance of the query, key, and value attention-based neural architectures currently dominant in large language models? Or will large language models (or more generally foundational models with different modalities) continue to scale up transformers? Fundamentally this new method must not employ layers of self-attention or cross-attention, but must show scaling laws more promising than transformer-based LLMs. This method must be commonly recognized by practitioners to be superior to transformer methods, and multiple state-of-the-art open (and closed) source models must employ this new method. From the invention of transformer, it took a few years for them to be universally utilized. However, with modern attention on foundational models, adoption of a better approach should be significantly more swift.

Get
Ṁ1,000
and
S1.00
Sort by:

Personally I have a bit of faith in this concept:

https://arxiv.org/abs/2312.00752

predicts NO

@Ebcc1 Definitely on my radar!

predicts YES

@Supermaxman What do you think about it and other possibilities?

predicts NO

@Ebcc1 Watching to see how peers receive it at ICLR: https://openreview.net/forum?id=AL1fq05o7H