MoE Inference
latest

Benchmarking

  • Benchmarking
    • Evaluation plan
    • Existing frameworks
      • Fairseq MoE
      • DeepSpeed-MoE
      • FasterTransformer MoE
    • MoE architectures

Design and Implementation

  • Design and Implementation
MoE Inference
  • Benchmarking
  • Existing frameworks
  • Edit on GitHub

Existing frameworks

We have documentation on the other inference frameworks we are benchmarking

  • Fairseq MoE
    • Parallelism in Fairseq MoE (inference)
    • Synchronization/communication collectives
    • Profiling & Benchmarking Evaluation
  • DeepSpeed-MoE
    • Parallelism in DeepSpeed-MoE
    • Synchronization/communication collectives
    • Pretrained MoE model
    • Benchmarking results
  • FasterTransformer MoE
    • NVIDIA Triton Inference Server
    • Parallelism in FasterTransformer
    • Optimizations
    • Pretrained MoE model
    • Benchmarking results
Previous Next

© Copyright 2022, the authors. Revision 46cc660a.

Built with Sphinx using a theme provided by Read the Docs.