Give me feedback on my latest paper
Ṁ1,000 / 1000
bounty left

I'm currently working on a PhD. The main thing I've produced so far is this paper outlining my hypothesis that Transformer models (the basis of LLMs) represent the best extant bounded approximation of Solomonoff Induction and some brief arguments for why it matters.

Some things I'll award this bounty for:

  • Feedback, positive or negative

  • Ideas for followup papers

  • Ideas for experiments/practical work that would help confirm/disconfirm my hypothesis

  • Links to relevant work

  • Anything else relevant to my work that I find helpful!

I'll award every response I find helpful, and more helpful comments will get more money. I might increase the bounty if I value your contribution enough. Feel free to make offers like "I'll write up my thoughts in a public blog post for Ṁ10000".

Get
Ṁ1,000
and
S1.00
Sort by:

And a more high-level thing. It seems implausible that regular is the upper limit of reliable computational tasks of transformers, given that they emulate human language not-that-awfully, and human language is... well, it's technically mildly-context-sensitive due to some rarer patterns, but at least the fact that regular mechanisms give bad performance and you need context-free ones to model basically anything worthwhile suggests their context-free performance can't be that bad.

(For computational linguistics aficionados: yes, you can circumvent this with derivation trees, but I dare you declare that LLMs build the neural network equivalent of derivation trees, I dare you.)

with the decimal expansion of the neuron’s value (in some base)

Another penny-pinching: isn't it a terminological contradiction to assume base<>10 and speak of "decimal" expansion? What's after the comma (point for you American weirdos) in, say, (101.1011)2 isn't its "decimal" part, because the number isn't in decimal system. Or is a different meaning of the word "base" meant?

For simplicity, we will assume that M uses a binary alphabet, but these results generalise to any alphabet.

This is really penny-pinching, but "any" should probably be "any larger": unary alphabets have some weird properties that disappear in larger alphabets (for instance, it is trivial to prove that any context-free formal language in unary alphabet is regular, where obviously it is not so for non-unary alphabets). Alternatively, if you do think your finding applies to unary alphabet, an explanation why should be there, even if a short one (for the n-ary with n>2, the coding explanation of why "if binary works this works" is well-known). Of course, a Turing machine with only one symbol is, how do I put it, not exactly interesting, but still.

High-level comments:

Talking about unbounded computation in reference to NNs, seems like the neural tangent kernel should be relevant. What does the NTK for a transformer look like (sparse hits for google on "neural tangent kernel transformer". [potential ref](https://arxiv.org/abs/2006.14548))? Is the story that this would act even more like SolInd? Would it get around the limitations of section 5? Food for thought.

Medium Level comments:

Section 4.1-2: Feels like these two subsections are part of the same argument, and could be combined.

Section 5.1: Feels like this subsection is titled more broadly than its contents indicate. Are there other examples of tasks that transformers fail on that are reminiscent of things a SolInd should get right?

Low-level comments:

My sense is that "transformers" is not usually capitalized in the literature.

"identification of patterns identified" feels like the same root word is being used too close to itself.

"incomputable" you use this word a few times, it seems like you are referencing the uncomputability of Solomonoff's induction. Why not state this more formally and cite a proof of the relevant theorem? To be even more constructively pedantic, is it even really true to say it is "incomputable without infinite memory and the capability to execute an infinite number of programs in a finite time period". What does infinite memory even mean?

"Any input string s given to M will result in some output string x". Unless M never halts.

"every computable string" -> shouldn't this just be, more simply, "every string"?