Will software-side AI scaling appear to be suddenly discontinuous before 2025?
➕
Plus
31
Ṁ6163
Jan 1
18%
chance

This market is taken from a comment by @L. I assume by "software side scaling" they mean something like "discontinuous progress in algorithmic efficiency", and that is what I will use to resolve this market.

See this paper for an explanation of algorithmic efficiency: https://arxiv.org/abs/2005.04305. tldr: the efficiency of an algorithm is the number of FLOPs using that algorithm required to achieve a given target.

Current estimates are that algorithmic efficiency doubles every ~2 years. This market resolves YES if there is a 10x algorithmic efficiency gain within any six month period before 2025, for a SOTA model in a major area of ML research (RL, sequence generation, translation, etc)

It must be a SOTA model on a reasonably broad benchmark - meaning it takes 1/10 the FLOPS to achieve SOTA performance at the end of the six month period, and it can't be, for instance, performance on a single Atari environment or even a single small language benchmark.

In short: it needs to be such a significant jump that no one can reasonably argue that there wasn't a massive jump, and it needs to be on something people actually care about.

I am using "algorithmic efficiency" rather than "increase in SOTA" because it's harder to define "discontinuity" across final performance. "2x increase" is perfectly reasonable in RL but nonsensical for a classification task where SOTA is already 90%.

Get
Ṁ1,000
and
S1.00
Sort by:

It seems plausible to me that OAI/Anthropic have amassed far more high quality data (transcripts, contractor, textbooks etc.) than they had previously and that by training on such data in addition to other algorithmic improvement they'd get a 10x efficiency gain.

I wouldn't consider this an 'algorithmic' gain, but would like @vluzko to confirm this.

Another edge case is what if GPT-5 comes out, and then within 6 months a distilled model comes out e.g. GPT-5-Turbo which cost 10x less compute to train and achieves equivalent performance? Presumably this should not count since the previous GPT-5's compute was necessary.

I find this question in principle very interesting, but these edge cases plus the absence of public information means that the resolution of this question seems noise dominated.

I will not count a gain purely from better data, but I would accept models that don't train on the exact same dataset (so there's room for improvements from data to sneak into the resolution) that are more than 10x better (e.g. if Model A has a curated dataset and is 20x more efficient than Model B, I will probably count that because I don't expect data alone to go from 10x to 20x). This is ambiguous and subjective, of course - if people think I should resolve that scenario N/A instead I'm okay doing that.

The initial cost of a distilled model will be counted.

predicts NO

Does Alpaca's new fine-tuning based approach count? https://crfm.stanford.edu/2023/03/13/alpaca.html

@jonsimon This isn't SOTA and obviously the algorithmic cost of a model includes the cost of any pre training.

predicts NO

@vluzko Ok so that's a "no" then, since the teacher model still needs to be expensively pretrained. Thanks!

@jonsimon yes, sorry, it is a no and I should explicitly state that

GPT-4 paper showed extremely consistent scaling behavior, no hint at any discontinuities of any kind. Definite update towards No.

predicts NO

What was the specific comment from L that led to this market?

What sorts of speedups count here? Would an 10x inference time speedup from something like CALM resolve to yes?

What about a new model architecture that achieves a 10x speedup, but is only useful for a single special-purpose task, like image-segmentation?

What about a massive model where a ton of priors are backed into it upfront so that it's very fast to train, but is so slow at inference time that it's impractical to use for anything in the real world?

@jonsimon I feel that the description already answers these questions.

  • Inference time speed up: no, algorithmic efficiency is a measure of training time, this is in the abstract of the linked paper.

  • Single special purpose task: no, the description states a reasonably broad benchmark in a major area of research.

  • Slow at inference: this is fine, inference time is not included in the measure. All that matters is reducing training time FLOPs.

@vluzko Thank you for clarifying. In that case this market seems way too biased towards yes. Why expect a smooth (exponential) trend that's held for a decade to suddenly go discontinuous?

@jonsimon Do you mean biased towards no? It resolves yes if the trend breaks.

To the more general point: a market about an extreme outcome is biased in some sense but not in a way that matters to me. My goal in asking this question isn't to be "fair" to both yes and no outcomes - it's to get information about this particular outcome. Modifying the criteria to make it more likely to resolve yes would make the market worse at that. I do generally try to have many questions covering different levels of extreme outcome, and I will add more algorithmic efficiency questions in the future, but this question is in fact asking the thing I want it to ask.

predicts NO

@vluzko This wasn't a criticism about how you constructed your market, it was a comment in how other market participants have behaved thus far.

If I'm understanding the linked paper correctly, algorithmic efficiency has increased at a steady exponential rate for a decade. So the sort of discontinuity this market is asking about is basically implying that the trend goes dramatically super-exponential, at least briefly. That seems... really unlikely a priori? Like, why would anyone expect that to happen specifically within the next two years when it hasn't happened in the prior 10?

predicts YES

@jonsimon reasons

predicts YES

in other words: I don't think I want to speed up the possibility I see by talking about it. It's obvious enough if you're the type to think of such things, and the real question in my head is whether anyone who knows what they're doing will figure out how to get a combination of ideas working that makes it actually run. the thing I'd really like to express here is, y'all, please be aware that things are likely to scale suddenly.

Also, for what it's worth, the effect I'm expecting would move somewhat sigmoidally, and would be a multiplicative factor on algorithm capabilities. I don't expect it to max out at "magically perfect intelligent system" the way the trad miri viewpoint does; I don't think this is a recursive self improvement magic button or anything like that, if anything I think even after this improvement AI will still not be smart enough to reliably come up with new ideas on the level needed to make this kind of improvement in the first place.

But yeah, I expect something specific in how training works to change in ways that drastically speed up learning.

@L post a sha256 hash of the idea then

predicts NO

@L this isn't something as simple as "only do training updates on data that the model is uncertain about" or something silly like that, right? If it's something more technically substantial then yeah, I'm with @vluzko , write it down and hash it, with the hash to be decrypted if/when said breakthrough happens.

predicts YES

@vluzko
09fcd0ced432f385ee7c5f7d3f193312c671811b1ef5c22bfecb7cc7d7e78873

predicts NO

@L sweeet looking forward to the eventual big reveal

lol

@NathanHelmBurger why "lol"? Some not-yet-released research you know about?

@jonsimon yes, that is my reason for betting yes. I have specific reason to believe a significant improvement is possible, and that people who know what they're doing already know about it.

Looks like that tag didn’t go through…. But also it looks impossible to tag someone with just one letter

@ian Yeah I was wondering why the user wasn't coming up