MoE architectures
We have documentation on the MoE models we are using for inference. Fairseq implements MoE using the G-Shard model, for which we have a summary doc. We also have docs on Fairseq’s 15B-parameters LM MoE model.
We have documentation on the MoE models we are using for inference. Fairseq implements MoE using the G-Shard model, for which we have a summary doc. We also have docs on Fairseq’s 15B-parameters LM MoE model.