When will a non-Transformer model become the top open source LLM?

Plus

Ṁ1335

2030

ALL

20%

In 2025 or earlier

60%

In 2026 or earlier

63%

In 2027 or earlier

65%

In 2028 or earlier

68%

In 2029 or earlier

70%

In 2030 or earlier

In 2024 the field of AI language models is dominated by Transformers. Many research papers are suggesting alterations and newer models, but none of them has successfully competed with Llamas and their open source friends.

The market will resolve positively as soon as the first place on the Hugging Face Open LLM Leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is taken by a non-Transformer model.

If Hugging Face disappears or falls into obscurity by the time this market is resolved, another similar ranking will be used.

The following modification to Transformer model are not enough to consider is a new model for the purposes of this question:

Changes to activation function
Extra dense layers
Changes to normalization/dropout/...
Changes to the number of heads/keys/queries etc.
Minor changes to how attention components are calculated (e.g. adding bias or calculating some sort of non-linearity)
Using Transformers in an ensemble
Other changes such that Wikipedia still categorizes the new model as a Transformer

An attention-based model in which attention is applied not to pairs of positions, but to some other domain, especially if it lead in significant improvement in efficiency, qualify as significant change for the purposes of this question.

I do not bet on my own questions.

#AI

#️ Technology

Get

1,000

and

1.00

4 Comments

Sort by:

In the 90s an incredible amount of effort went into doing 3D occlusion using clever approaches that avoided the gargantuan memory cost of a floating point depth buffer. Now we just pay for a depth buffer.

We're just gonna pay N^2 compute (and linear memory)

@HastingsGreer N^2 might be a tough call if you want it to write a novel, or simulate a person living for decades.

Also below N^2 scaling is not a necessary condition for this market to resolve positively.

bought Ṁ10 In 2025 or earlier NO

@OlegEterevsky

N^2 might be a tough call if you want it to write a novel, or simulate a person living for decades.

Gemini 1.5's 1M context window convinced me transformers can scale pretty well for that.

@singer I don't know the technical details, but I'm assuming it's not technically a "traditional" Transformer.

Related questions

Related questions