Will it be possible to disentangle most of the features learned by a model comparable to GPT-3 this decade? (1k subsidy) | Manifold

Will it be possible to disentangle most of the features learned by a model comparable to GPT-3 this decade? (1k subsidy)

Plus

22

Ṁ1153

2031

58%

chance

1D

1W

1M

ALL

#Technical AI Timelines

#Mechanistic interpretability

Get

1,000

and

1.00

Sort by:

https://chat.openai.com/share/543c2953-982b-4ef0-8ba8-967068140987

☝️Seems difficult, bigger model than gpt2

bought Ṁ5 YES at 57%

@VAPOR Essentially a link to the autointerp work by OpenAI, i.e. Bills et al (2023) (link).

@EliezerYudkowsky trade on your current estimate?

@firstuserhere What is a disentangled feature?

@EliezerYudkowsky something that represents a single property of the data

@firstuserhere That is not enough for me to figure out how this market will be judged.

@EliezerYudkowsky It is quite fuzzy, I agree, and there are many different definitions for features.

Here I refer to a basic set of meaningful directions in the activation space from which more complex directions can be created from; these meaningful directions can be converted to human understandable concepts (to allow for the existence of features which are not human understandable), and the model actually learns and uses these directions as general ways to represent the properties of the input data.

The question is then, whether it will be possible to cleanly separate out these directions and to convert them into human understandable concepts for most of the properties of the data that the model is capable of representing and using.

@firstuserhere Does "human-understandable" means "at least one human understood it", or "all humans understood it", or something else?

@a2bb It is better to say human interpretable than understandable, but saying understandable in the text above makes that text easier for me to parse

Related questions

Will it be possible to disentangle most of the features learned by a model comparable to GPT-2 this decade?

Will we have an open-source model that is equivalent GPT-4 by end of 2025?

Will GPT-5 have fewer parameters than GPT-4? (1500M subsidy)

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

Will a GPT-3 quality model be trained for under $1,000 by 2030?

Will it be possible to disentangle most of the features learned by a model comparable to GPT-4 this decade?

Will a model as great as GPT-5 be available to the public in 2025?

Will a model be trained using at least as much compute as GPT-3 using AMD GPUs before Jan 1 2026?

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

Will a GPT-3 quality model be trained for under $10.000 by 2030?

Related questions

Will it be possible to disentangle most of the features learned by a model comparable to GPT-2 this decade?

Will it be possible to disentangle most of the features learned by a model comparable to GPT-4 this decade?

Will we have an open-source model that is equivalent GPT-4 by end of 2025?

Will a model as great as GPT-5 be available to the public in 2025?

Will GPT-5 have fewer parameters than GPT-4? (1500M subsidy)

Will a model be trained using at least as much compute as GPT-3 using AMD GPUs before Jan 1 2026?

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

Will a GPT-3 quality model be trained for under $1,000 by 2030?

Will a GPT-3 quality model be trained for under $10.000 by 2030?

Terms & Conditions•Privacy Policy•Sweepstakes Rules