diffulex.layer¶
diffulex.layer contains reusable neural-network layers and backend adapters
used by model implementations. The package keeps tensor-parallel layout,
optional LoRA handling, rotary embeddings, activation fusion, and vLLM-backed
fallbacks outside individual model files.
Module |
Role |
|---|---|
|
Fused gated activations with native and optional vLLM-backed paths. |
|
Tensor-parallel vocabulary embeddings and LM heads. |
|
RMSNorm and fused add-RMSNorm wrappers. |
|
Replicated, column-parallel, row-parallel, QKV, and merged linear layers with LoRA support. |
|
Rotary embedding construction and application helpers. |
|
Runtime toggles and lazy accessors for optional vLLM layer implementations. |
diffulex.layer.activation¶
This module provides gated activation blocks used by MLP implementations. It prefers vLLM fused operators when enabled and available, then falls back to native PyTorch implementations.
Symbol |
Purpose |
|---|---|
|
SiLU-gated activation block for SwiGLU-style MLPs. |
|
GELU-tanh gated activation block. |
Use these modules in model code instead of open-coding chunk/split activation logic.
diffulex.layer.embed_head¶
This module handles vocabulary sharding for embeddings and output projection. It gathers or reduces tensor-parallel outputs as needed so model code can share the same layer abstractions across single-GPU and tensor-parallel execution.
Symbol |
Purpose |
|---|---|
|
Sharded embedding table with tensor-parallel rank handling. |
|
Output head built on the same vocabulary-parallel layout. |
Use these layers when model vocab weights need to be partitioned across tensor parallel ranks.
diffulex.layer.layernorm¶
This module provides RMSNorm implementations with optional fused vLLM paths. The wrapper keeps model code independent of the selected backend.
Symbol |
Purpose |
|---|---|
|
RMSNorm module with optional fused add+norm path. |
Use RMSNorm in model implementations when the checkpoint architecture expects
RMS normalization.
diffulex.layer.linear¶
This module contains the common linear-layer variants used by model families. It combines tensor-parallel splitting/gathering with optional LoRA weight loading hooks.
Symbol |
Purpose |
|---|---|
|
Adapter-loading behavior shared by linear variants. |
|
Common base class for Diffulex linear layers. |
|
Non-sharded linear layer. |
|
Column-sharded tensor-parallel linear layer. |
|
Column-parallel layer for merged projections. |
|
Specialized QKV projection layer. |
|
Row-sharded tensor-parallel linear layer. |
Choose the layer variant that matches the checkpoint’s weight layout and the model’s tensor-parallel split.
diffulex.layer.rotary_embedding¶
This module builds and applies rotary position embeddings. It includes standard rotary embeddings, partial rotary embeddings, Gemma-style proportional rotary scaling, and adapters to vLLM rotary implementations.
Symbol |
Purpose |
|---|---|
|
Standard rotary embedding module. |
|
Rotary embedding for models that rotate only part of the head dimension. |
|
Gemma-style proportional rotary scaling. |
|
Adapter around vLLM rotary implementations. |
|
Cached rotary embedding factory. |
|
Cached Gemma proportional rotary factory. |
Use the factory helpers instead of constructing rotary modules manually when model code should share cache behavior.
diffulex.layer.vllm_backend¶
This module owns the runtime switch for optional vLLM-backed common layers. It keeps imports lazy so environments without the relevant vLLM symbols can still import Diffulex.
Symbol |
Purpose |
|---|---|
|
Enables or disables vLLM-backed layer paths. |
|
Reports whether vLLM-backed paths are active. |
|
Clears cached backend lookups. |
|
Lazy activation backend accessors. |
|
Lazy RMSNorm backend accessor. |
|
Lazy rotary backend accessor. |
Use these helpers inside layer wrappers, not directly from model code.