When will a non-Transformer model become the top open source LLM?
➕
Plus
25
Ṁ1308
Jan 2
3%
In 2024
22%
In 2025 or earlier
62%
In 2026 or earlier
63%
In 2027 or earlier
65%
In 2028 or earlier
68%
In 2029 or earlier
70%
In 2030 or earlier

In 2024 the field of AI language models is dominated by Transformers. Many research papers are suggesting alterations and newer models, but none of them has successfully competed with Llamas and their open source friends.

The market will resolve positively as soon as the first place on the Hugging Face Open LLM Leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is taken by a non-Transformer model.

If Hugging Face disappears or falls into obscurity by the time this market is resolved, another similar ranking will be used.

The following modification to Transformer model are not enough to consider is a new model for the purposes of this question:

  • Changes to activation function

  • Extra dense layers

  • Changes to normalization/dropout/...

  • Changes to the number of heads/keys/queries etc.

  • Minor changes to how attention components are calculated (e.g. adding bias or calculating some sort of non-linearity)

  • Using Transformers in an ensemble

  • Other changes such that Wikipedia still categorizes the new model as a Transformer

An attention-based model in which attention is applied not to pairs of positions, but to some other domain, especially if it lead in significant improvement in efficiency, qualify as significant change for the purposes of this question.

I do not bet on my own questions.

Get
Ṁ1,000
and
S1.00
Sort by:

In the 90s an incredible amount of effort went into doing 3D occlusion using clever approaches that avoided the gargantuan memory cost of a floating point depth buffer. Now we just pay for a depth buffer.

We're just gonna pay N^2 compute (and linear memory)

@HastingsGreer N^2 might be a tough call if you want it to write a novel, or simulate a person living for decades.

Also below N^2 scaling is not a necessary condition for this market to resolve positively.

bought Ṁ10 In 2025 or earlier NO

@OlegEterevsky

N^2 might be a tough call if you want it to write a novel, or simulate a person living for decades.

Gemini 1.5's 1M context window convinced me transformers can scale pretty well for that.

@singer I don't know the technical details, but I'm assuming it's not technically a "traditional" Transformer.