# diffulex.distributed `diffulex.distributed` builds and stores the distributed topology used by tensor parallel, data parallel, expert parallel, and sequence-parallel style groupings. The rest of the engine reads this state instead of reconstructing process groups locally. | Module | Role | | --- | --- | | `diffulex.distributed.parallel_state` | Validates topology inputs, initializes torch distributed groups, and exposes the current parallel state. | ## diffulex.distributed.parallel_state This module centralizes distributed process-group setup. It computes world size from configured parallel dimensions, resolves topology, creates rank groups, and stores a `ParallelState` object for later use by layers, MoE code, and model runners. | Symbol | Purpose | | --- | --- | | `WorldMesh` | Frozen description of global rank layout. | | `BaseModelParallelLayout` | Tensor/data/sequence-parallel rank grouping for dense model execution. | | `MoEParallelLayout` | Expert-parallel rank grouping for MoE paths. | | `ParallelState` | Aggregates rank, world size, process groups, and parallel-layout metadata. | | `get_world_size` | Computes effective world size from configured parallel dimensions. | | `init_parallel_state` | Creates and stores the runtime `ParallelState`. | | `fetch_parallel_state` | Returns the active parallel state. | | `reset_parallel_state` | Clears global parallel state, mainly for tests or process teardown. | | `build_parallel_state_for_test` | Builds a state object without full runtime initialization. | Callers should fetch the existing state rather than deriving rank groups from configuration again. That keeps layer code and MoE code aligned with the engine worker topology.