diffulex.distributed¶

diffulex.distributed builds and stores the distributed topology used by tensor parallel, data parallel, expert parallel, and sequence-parallel style groupings. The rest of the engine reads this state instead of reconstructing process groups locally.

Module	Role
`diffulex.distributed.parallel_state`	Validates topology inputs, initializes torch distributed groups, and exposes the current parallel state.

diffulex.distributed.parallel_state¶

This module centralizes distributed process-group setup. It computes world size from configured parallel dimensions, resolves topology, creates rank groups, and stores a ParallelState object for later use by layers, MoE code, and model runners.

Symbol	Purpose
`WorldMesh`	Frozen description of global rank layout.
`BaseModelParallelLayout`	Tensor/data/sequence-parallel rank grouping for dense model execution.
`MoEParallelLayout`	Expert-parallel rank grouping for MoE paths.
`ParallelState`	Aggregates rank, world size, process groups, and parallel-layout metadata.
`get_world_size`	Computes effective world size from configured parallel dimensions.
`init_parallel_state`	Creates and stores the runtime `ParallelState`.
`fetch_parallel_state`	Returns the active parallel state.
`reset_parallel_state`	Clears global parallel state, mainly for tests or process teardown.
`build_parallel_state_for_test`	Builds a state object without full runtime initialization.

Callers should fetch the existing state rather than deriving rank groups from configuration again. That keeps layer code and MoE code aligned with the engine worker topology.