# diffulex.distributed

`diffulex.distributed` builds and stores the distributed topology used by tensor
parallel, data parallel, expert parallel, and sequence-parallel style groupings.
The rest of the engine reads this state instead of reconstructing process groups
locally.

| Module | Role |
| --- | --- |
| `diffulex.distributed.parallel_state` | Validates topology inputs, initializes torch distributed groups, and exposes the current parallel state. |

## diffulex.distributed.parallel_state

This module centralizes distributed process-group setup. It computes world size
from configured parallel dimensions, resolves topology, creates rank groups, and
stores a `ParallelState` object for later use by layers, MoE code, and model
runners.

| Symbol | Purpose |
| --- | --- |
| `WorldMesh` | Frozen description of global rank layout. |
| `BaseModelParallelLayout` | Tensor/data/sequence-parallel rank grouping for dense model execution. |
| `MoEParallelLayout` | Expert-parallel rank grouping for MoE paths. |
| `ParallelState` | Aggregates rank, world size, process groups, and parallel-layout metadata. |
| `get_world_size` | Computes effective world size from configured parallel dimensions. |
| `init_parallel_state` | Creates and stores the runtime `ParallelState`. |
| `fetch_parallel_state` | Returns the active parallel state. |
| `reset_parallel_state` | Clears global parallel state, mainly for tests or process teardown. |
| `build_parallel_state_for_test` | Builds a state object without full runtime initialization. |

Callers should fetch the existing state rather than deriving rank groups from
configuration again. That keeps layer code and MoE code aligned with the engine
worker topology.