Is gpt-3.5-turbo a Mixture of Experts (MoE)?

In a blog post (https://152334h.github.io/blog/non-determinism-in-gpt-4/), 152344H argues that gpt-3.5-turbo may be a Mixture of Experts, because it, like GPT-4, exhibits nondeterminacy at temperature=0, whereas the original GPT-3 model, davinci, generates deterministic outputs.

In this context, I will define MoE as any form of sparsity where only a subset of the model's parameters are activated every forward pass.

Resolves to YES if public evidence is sufficient to prove that gpt-3.5-turbo is a MoE model, NO if public evidence is sufficient to prove that gpt-3.5-turbo is a dense model, N/A if evidence either way cannot be found by 2026. I will subjectively determine the sufficiency of the evidence.

If the underlying model behind gpt-3.5-turbo is changed, the model type of the new model (MoE, dense) will be used to resolve the market.

Related questions

Related questions