diffulex.moe

diffulex.moe contains Mixture-of-Experts configuration helpers, router metadata, token dispatchers, top-k routing, and fused expert execution layers. Model code should use the package-level builders instead of selecting MoE implementations directly.

Current public support is conservative: model code should use the package-level builders and the documented MoE GEMM options instead of selecting dispatcher internals directly.

Module

Role

diffulex.moe.config

Reads MoE-related attributes from model configs.

diffulex.moe.dispatcher

Token dispatcher implementations and dispatcher factory.

diffulex.moe.layer

Fused MoE layer implementations and layer factory.

diffulex.moe.metadata

Router, dispatcher, and expert execution metadata dataclasses.

diffulex.moe.topk

Top-k router implementations and router factory.

diffulex.moe.config

This module normalizes model-config differences so MoE code can ask for common concepts such as expert count, experts-per-token, sparse-layer placement, and intermediate size.

Symbol

Purpose

get_num_experts

Reads total expert count.

get_num_experts_per_tok

Reads top-k experts per token.

get_moe_intermediate_size

Reads MoE hidden size.

is_moe_layer

Determines whether a layer index should use MoE.

Use these helpers rather than reading raw HF config attributes directly.

diffulex.moe.dispatcher

This package moves tokens between ranks for expert execution. The dispatcher factory chooses an implementation based on config.

Symbol

Purpose

TokenDispatcher

Abstract dispatcher contract.

DispatcherOutput

Output structure returned by dispatchers.

build_token_dispatcher

Factory for the configured dispatcher backend.

NaiveA2ADispatcher

Reference all-to-all dispatcher used by internal experiments.

Use the dispatcher factory from model code. Direct dispatcher selection is not a public tuning surface for normal serving or benchmark runs.

diffulex.moe.layer

This package executes expert MLPs after routing. It provides naive, tensor parallel, expert parallel, and optional vLLM-backed implementations behind a factory function.

Symbol

Purpose

build_moe_block

Factory for MoE blocks.

FusedMoE

Base fused MoE layer contract.

SharedExpertMLP

Shared expert MLP helper.

NaiveFusedMoE

Reference fused MoE implementation.

TPFusedMoE

Tensor-parallel fused MoE implementation.

EPFusedMoE

Expert-parallel fused MoE implementation.

Model layers should call the factory rather than instantiate implementation classes directly.

diffulex.moe.metadata

This module defines structured metadata passed between routers, dispatchers, and expert execution layers.

Symbol

Purpose

RouterMetadata

Router output metadata.

DispatchMetadata

Base dispatcher metadata.

ExpertExecutionMetadata

Metadata needed while executing experts.

DispatcherStage

Dispatcher lifecycle stage enum.

Use these dataclasses to keep dispatcher and expert-layer contracts explicit.

diffulex.moe.topk

This package selects experts for each token. It provides top-k router implementations and a factory used by MoE layers.

Symbol

Purpose

TopKRouter

Base router contract.

TopKOutput

Router output dataclass.

build_topk_router

Factory for configured router behavior.

NaiveTopKRouter

Standard top-k router.

GroupLimitedTopKRouter

Router with group-limited expert selection.