Mixture of Experts #19

xrsrke · 2023-10-25T05:39:10Z

APIs

from pipegoose.nn.expert_parallel import ExpertParallel, ExpertLoss

parallel_context = ParallelContext.from_torch(expert_parallel_size=8)

mlp = CustomExpert()
router = CustomRouter()
noise_policy = CustomNoisePolicy()
loss_func = nn.CrossEntropy()

model = ExpertParallel(
     model,
     expert=mlp,
     router=router,
     noise_policy=noise_policy,
     enable_tensor_parallelism=True,
     parallel_context=parallel_context,
).parallelize()

loss_func = ExpertLoss(loss_func, aux_weight=0.1)

TODOs

Engineering Reading

Pipeline MoE - A Flexible MoE Implementation with Pipeline Parallelism
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
DeepSpeed-TED: Tensor-Expert-Data Parallelism Optimize Hybrid: A Approach to Mixture-of-Experts Training
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
FasterMoE: Modeling and Optimizing Training of Large-Scale Dynamic Pre-Trained Models
MegaBlocks - Efficient Sparse Training with Mixture-of-Experts

MoE Reading

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Mixture-of-Experts with Expert Choice Routing

The text was updated successfully, but these errors were encountered:

xrsrke added the help wanted Extra attention is needed label Oct 25, 2023

xrsrke self-assigned this Oct 25, 2023

xrsrke removed the help wanted Extra attention is needed label Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixture of Experts #19

Mixture of Experts #19

xrsrke commented Oct 25, 2023 •

edited

Mixture of Experts #19

Mixture of Experts #19

Comments

xrsrke commented Oct 25, 2023 • edited

xrsrke commented Oct 25, 2023 •

edited