# diffulex.layer `diffulex.layer` contains reusable neural-network layers and backend adapters used by model implementations. The package keeps tensor-parallel layout, optional LoRA handling, rotary embeddings, activation fusion, and vLLM-backed fallbacks outside individual model files. | Module | Role | | --- | --- | | `diffulex.layer.activation` | Fused gated activations with native and optional vLLM-backed paths. | | `diffulex.layer.embed_head` | Tensor-parallel vocabulary embeddings and LM heads. | | `diffulex.layer.layernorm` | RMSNorm and fused add-RMSNorm wrappers. | | `diffulex.layer.linear` | Replicated, column-parallel, row-parallel, QKV, and merged linear layers with LoRA support. | | `diffulex.layer.rotary_embedding` | Rotary embedding construction and application helpers. | | `diffulex.layer.vllm_backend` | Runtime toggles and lazy accessors for optional vLLM layer implementations. | ## diffulex.layer.activation This module provides gated activation blocks used by MLP implementations. It prefers vLLM fused operators when enabled and available, then falls back to native PyTorch implementations. | Symbol | Purpose | | --- | --- | | `SiluAndMul` | SiLU-gated activation block for SwiGLU-style MLPs. | | `GeluAndMul` | GELU-tanh gated activation block. | Use these modules in model code instead of open-coding chunk/split activation logic. ## diffulex.layer.embed_head This module handles vocabulary sharding for embeddings and output projection. It gathers or reduces tensor-parallel outputs as needed so model code can share the same layer abstractions across single-GPU and tensor-parallel execution. | Symbol | Purpose | | --- | --- | | `VocabParallelEmbedding` | Sharded embedding table with tensor-parallel rank handling. | | `ParallelLMHead` | Output head built on the same vocabulary-parallel layout. | Use these layers when model vocab weights need to be partitioned across tensor parallel ranks. ## diffulex.layer.layernorm This module provides RMSNorm implementations with optional fused vLLM paths. The wrapper keeps model code independent of the selected backend. | Symbol | Purpose | | --- | --- | | `RMSNorm` | RMSNorm module with optional fused add+norm path. | Use `RMSNorm` in model implementations when the checkpoint architecture expects RMS normalization. ## diffulex.layer.linear This module contains the common linear-layer variants used by model families. It combines tensor-parallel splitting/gathering with optional LoRA weight loading hooks. | Symbol | Purpose | | --- | --- | | `LoRAMixin` | Adapter-loading behavior shared by linear variants. | | `LinearBase` | Common base class for Diffulex linear layers. | | `ReplicatedLinear` | Non-sharded linear layer. | | `ColumnParallelLinear` | Column-sharded tensor-parallel linear layer. | | `MergedColumnParallelLinear` | Column-parallel layer for merged projections. | | `QKVParallelLinear` | Specialized QKV projection layer. | | `RowParallelLinear` | Row-sharded tensor-parallel linear layer. | Choose the layer variant that matches the checkpoint's weight layout and the model's tensor-parallel split. ## diffulex.layer.rotary_embedding This module builds and applies rotary position embeddings. It includes standard rotary embeddings, partial rotary embeddings, Gemma-style proportional rotary scaling, and adapters to vLLM rotary implementations. | Symbol | Purpose | | --- | --- | | `RotaryEmbedding` | Standard rotary embedding module. | | `PartialRotaryEmbedding` | Rotary embedding for models that rotate only part of the head dimension. | | `Gemma4ProportionalRotaryEmbedding` | Gemma-style proportional rotary scaling. | | `VllmRotaryEmbeddingAdapter` | Adapter around vLLM rotary implementations. | | `get_rope` | Cached rotary embedding factory. | | `get_gemma4_proportional_rope` | Cached Gemma proportional rotary factory. | Use the factory helpers instead of constructing rotary modules manually when model code should share cache behavior. ## diffulex.layer.vllm_backend This module owns the runtime switch for optional vLLM-backed common layers. It keeps imports lazy so environments without the relevant vLLM symbols can still import Diffulex. | Symbol | Purpose | | --- | --- | | `set_vllm_layers_enabled` | Enables or disables vLLM-backed layer paths. | | `is_vllm_layers_enabled` | Reports whether vLLM-backed paths are active. | | `clear_vllm_layer_caches` | Clears cached backend lookups. | | `get_vllm_silu_and_mul_cls` / `get_vllm_gelu_and_mul_cls` | Lazy activation backend accessors. | | `get_vllm_rmsnorm_cls` | Lazy RMSNorm backend accessor. | | `get_vllm_rope_fn` | Lazy rotary backend accessor. | Use these helpers inside layer wrappers, not directly from model code.