A pure binary LLM will exist by end of 2024

Plus

Ṁ3187

Jan 1

chance

ALL

A pure binary neural net is a neural network represented as pure combinatorial logic. Naively unrolling multi bit floating point/integer multiplication to binary does not count, the weights and activations must be binary. I will arbitrarily declare that integer weights of 3 bits or fewer are permitted to be unrolled. But note that the whole model end to end must be reduced to logic gates.

For example [Unrolling Ternary Neural Networks](https://arxiv.org/abs/1909.04509) almost satisfies the definition but uses patches and hence does not quite count. (Also I'm interested in language models not image models.)

It does not matter how the model was trained, only that it has adequate accuracy when in binarized form.

Resolves YES if a pure binary language model with bits per byte accuracy on The Pile better than or equal to GPT-2 (1.225 BPB) exits. It does not need to be publicly accessible as long as it is reported by a credible source (Deepmind, OpenAI, ElutherAI, etc).

Resolve NO if there is no credible report of such a model.

#Technical AI Timelines

Get

1,000

and

1.00

7 Comments

Sort by:

We have less than a year left. I have sold my stake in this market and will not bet further on it in case it ends up being subjective.

My personal efforts in this space have not been as successful as I had hoped. Personally, I think the market is ~well priced? I will be interested to see how this resolves.

Trit weights https://arxiv.org/abs/2402.17764

Still 8 bit activations, so does not qualify, but a sparse weights matrix should compact down much nicer.

We are down to 1 bit weights. However looks like the activations are still 8 bit, so it does not quite qualify. (But I'm guessing that they could be unrolled with moderate effort.)

https://arxiv.org/abs/2310.11453

BitNet: Scaling 1-bit Transformers for Large Language Models

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.

predicts YES

See also, a more strict version: https://manifold.markets/Amaryllis/a-pure-combinational-logic-byte-lev

predicts YES

Looks like we will be getting 3 bit quantized LLaMA soon:
- https://arxiv.org/abs/2210.17323
- https://news.ycombinator.com/item?id=35107058

Now all that remains to resolve this market is to somehow quantize the softmaxes, and then unroll the whole thing to combinational logic.

predicts YES

Ultra-low Precision Multiplication-free Training for Deep Neural Networks: https://arxiv.org/abs/2302.14458

1 sign bit, 4 exponent bits. Looks like it works on transformer language models. I am unclear on how they handle the softmaxes however. To resolve this market, the softmaxes would need to be fully transformed to combinational logic.

predicts YES

https://arxiv.org/abs/2212.09720

We are beginning to get down to 4 bit weights.

However note that even if the weights were 3 bit, the model would need to be fully reduced to combinational logic, including any softmaxes etc, to resolve YES.

Related questions

Related questions