By 2025 will there be a competitive large language model with >50% of the total training data generated from a large language model? | Manifold

By 2025 will there be a competitive large language model with >50% of the total training data generated from a large language model?

Mini

7

Ṁ190

Mar 2

75%

chance

1D

1W

1M

ALL

Large meaning >= 20b parameters.
Competitive meaning the benchmark results are close, or better, than a model trained with only human text.

Computer generated text does not count, it has to be the output of a language model. For example, converting code to an intermediate representation of a compiler and training only on that would not count.

Processed text is valid, as long as it's sourced from the language model.

Multiple stages of training are fine, so if there's a training period on only human text, as long as the total AI training examples are > 50% of the total training examples over the entire training this will resolve yes.

Market resolves on March 1st 2025 to account for announcement of models trained in the last half of 2024.

Get

1,000

and

1.00

Sort by:

A model extraction attack is enough to resolve this yes, right? Or any kind of distillation process where we train a model and use its output to train a model?

Does Constitutional AI count?

predicts YES

@MartinRandall It would have to be on text, not on logits, so like alpaca and friends are fine if they scaled it up to 750b tokens, but a traditional student/teacher setup is not.

Constitutional AI would count yeah, if it was >50% of the total.

Related questions

Will a large language model beat a super grandmaster playing chess by 2028?

By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?

Will any 10 trillion+ parameter language model that follows instructions be released to the public before 2026?

Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?

In 2030, will there be more than 10 $5bn companies that are some form of large language model focused on a specific task. ie not Microsoft, not OpenAI,

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2024?

Will a Large Language Model be listed as an author on a peer-reviewed paper by the end of 2025?

How big will Mistral's known largest language model be? (2024)

By 2030, will large language models still be at the peak of AI? [DRAFT]

Will Transformer based architectures still be SOTA for language modelling by 2026?

Related questions

Will a large language model beat a super grandmaster playing chess by 2028?

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2024?

By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?

Will a Large Language Model be listed as an author on a peer-reviewed paper by the end of 2025?

Will any 10 trillion+ parameter language model that follows instructions be released to the public before 2026?

How big will Mistral's known largest language model be? (2024)

Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?

By 2030, will large language models still be at the peak of AI? [DRAFT]

In 2030, will there be more than 10 $5bn companies that are some form of large language model focused on a specific task. ie not Microsoft, not OpenAI,

Will Transformer based architectures still be SOTA for language modelling by 2026?

Terms & Conditions•Privacy Policy•Sweepstakes Rules