diffulex.distributed¶
diffulex.distributed builds and stores the distributed topology used by tensor
parallel, data parallel, expert parallel, and sequence-parallel style groupings.
The rest of the engine reads this state instead of reconstructing process groups
locally.
Module |
Role |
|---|---|
|
Validates topology inputs, initializes torch distributed groups, and exposes the current parallel state. |
diffulex.distributed.parallel_state¶
This module centralizes distributed process-group setup. It computes world size
from configured parallel dimensions, resolves topology, creates rank groups, and
stores a ParallelState object for later use by layers, MoE code, and model
runners.
Symbol |
Purpose |
|---|---|
|
Frozen description of global rank layout. |
|
Tensor/data/sequence-parallel rank grouping for dense model execution. |
|
Expert-parallel rank grouping for MoE paths. |
|
Aggregates rank, world size, process groups, and parallel-layout metadata. |
|
Computes effective world size from configured parallel dimensions. |
|
Creates and stores the runtime |
|
Returns the active parallel state. |
|
Clears global parallel state, mainly for tests or process teardown. |
|
Builds a state object without full runtime initialization. |
Callers should fetch the existing state rather than deriving rank groups from configuration again. That keeps layer code and MoE code aligned with the engine worker topology.