MixtureOfExperts

ThoughtStorms Wiki

Context : Transformers, NeuralNetworks, LanguageModels

A mixture of experts is when you replace one large neural network / language model by a number of smaller ones. And then have a cheaper switching circuit to select which "expert" to use for a task. In fact AFAICT, this is also like a way of partitioning a large neural net and just "switching off" some of the weights and calculations when they are not relevant.

https://huggingface.co/blog/moe

A million experts? MixtureOfAMillionExperts