diffulex.moe¶
diffulex.moe contains Mixture-of-Experts configuration helpers, router
metadata, token dispatchers, top-k routing, and fused expert execution layers.
Model code should use the package-level builders instead of selecting MoE
implementations directly.
Current public support is conservative: model code should use the package-level builders and the documented MoE GEMM options instead of selecting dispatcher internals directly.
Module |
Role |
|---|---|
|
Reads MoE-related attributes from model configs. |
|
Token dispatcher implementations and dispatcher factory. |
|
Fused MoE layer implementations and layer factory. |
|
Router, dispatcher, and expert execution metadata dataclasses. |
|
Top-k router implementations and router factory. |
diffulex.moe.config¶
This module normalizes model-config differences so MoE code can ask for common concepts such as expert count, experts-per-token, sparse-layer placement, and intermediate size.
Symbol |
Purpose |
|---|---|
|
Reads total expert count. |
|
Reads top-k experts per token. |
|
Reads MoE hidden size. |
|
Determines whether a layer index should use MoE. |
Use these helpers rather than reading raw HF config attributes directly.
diffulex.moe.dispatcher¶
This package moves tokens between ranks for expert execution. The dispatcher factory chooses an implementation based on config.
Symbol |
Purpose |
|---|---|
|
Abstract dispatcher contract. |
|
Output structure returned by dispatchers. |
|
Factory for the configured dispatcher backend. |
|
Reference all-to-all dispatcher used by internal experiments. |
Use the dispatcher factory from model code. Direct dispatcher selection is not a public tuning surface for normal serving or benchmark runs.
diffulex.moe.layer¶
This package executes expert MLPs after routing. It provides naive, tensor parallel, expert parallel, and optional vLLM-backed implementations behind a factory function.
Symbol |
Purpose |
|---|---|
|
Factory for MoE blocks. |
|
Base fused MoE layer contract. |
|
Shared expert MLP helper. |
|
Reference fused MoE implementation. |
|
Tensor-parallel fused MoE implementation. |
|
Expert-parallel fused MoE implementation. |
Model layers should call the factory rather than instantiate implementation classes directly.
diffulex.moe.metadata¶
This module defines structured metadata passed between routers, dispatchers, and expert execution layers.
Symbol |
Purpose |
|---|---|
|
Router output metadata. |
|
Base dispatcher metadata. |
|
Metadata needed while executing experts. |
|
Dispatcher lifecycle stage enum. |
Use these dataclasses to keep dispatcher and expert-layer contracts explicit.
diffulex.moe.topk¶
This package selects experts for each token. It provides top-k router implementations and a factory used by MoE layers.
Symbol |
Purpose |
|---|---|
|
Base router contract. |
|
Router output dataclass. |
|
Factory for configured router behavior. |
|
Standard top-k router. |
|
Router with group-limited expert selection. |