Pure-PyTorch lightweight Mamba with multi-dilated causal conv front-end

These details have not been verified by PyPI

Project links

Homepage

Project description

lite-mamba

A minimal, pure-PyTorch implementation of Mamba with a multi-dilated causal depthwise conv front-end. No CUDA/Triton build needed; works on CPU or GPU with standard PyTorch ops.

Dual Framework Support: Includes both PyTorch and TensorFlow implementations with identical core logic and mathematical formulations, so you can use either framework depending on your environment.

Install

pip install torch einops
pip install lite-mamba

TensorFlow path (optional):

pip install "lite-mamba[tensorflow]"

Usage

PyTorch

from lite_mamba import Mamba, PTCNMamba, STCNMamba, DPWCMamba
import torch

x = torch.randn(2, 128, 512)  # (batch, seq, d_model)
m = Mamba(d_model=512, d_conv=3, conv_dilations=(1, 2, 4, 8))
y = m(x)
print(y.shape)  # (2, 128, 512)

TensorFlow

from lite_mamba import TFPTCNMamba
import tensorflow as tf

x = tf.random.normal((2, 128, 512))  # (batch, seq, d_model)
m = TFPTCNMamba(d_model=512, d_conv=3, conv_dilations=(1, 2, 4, 8))
y = m(x)
print(y.shape)  # (2, 128, 512)

Conv front-end variants

PTCNMamba: identical to Mamba, mixes parallel dilated depthwise conv branches via learned softmax gates.
STCNMamba: runs the same depthwise conv layers in sequence (no gating); each branch output feeds the next to create a deterministic dilation stack.
DPWCMamba: pairs each depthwise branch with a pointwise (1×1) conv before the gating mix, adding extra channel mixing without stacking more layers.

All variants expose the same constructor signature (d_model, d_state, conv_dilations, etc.) and streaming helpers (allocate_inference_cache, step). Swap them simply by changing the imported class name:

from lite_mamba import STCNMamba

m = STCNMamba(d_model=512, d_state=16, conv_dilations=(1, 2, 4))

Use DPWCMamba for richer channel interactions in each branch, and STCNMamba when you want a straightforward sequential dilation pipeline (e.g., for debugging or reproducing the behavior of stacked TCN layers).

Baseline helper

BaselineMamba mirrors the upstream state-spaces/mamba block: a single depthwise causal convolution followed by the SSM parameter projection, selective scan recurrence, and streaming helpers. baseline_mamba is a thin functional alias that instantiates the class with the same defaults so you can reproduce the reference layout without duplicating constructor arguments.

from lite_mamba import BaselineMamba

m = BaselineMamba(d_model=512, d_conv=3)

TensorFlow variants

TensorFlow implementations mirror the exact same core logic and mathematical formulations as PyTorch:

TFBaselineMamba - Single-branch baseline matching reference Mamba
TFPTCNMamba - Parallel dilated TCN branches with learned gating (default TFMamba)
TFSTCNMamba - Stacked/sequential dilated TCN branches
TFDPWCMamba - Depthwise + pointwise convolution branches

All TensorFlow variants support:

Same constructor API as PyTorch versions
Streaming inference via allocate_inference_cache() and step()
Custom dilated causal convolutions
Identical SSM discretization and selective scan

import tensorflow as tf
from lite_mamba import TFBaselineMamba, TFSTCNMamba, TFDPWCMamba

# Baseline single-branch
m1 = TFBaselineMamba(d_model=512, d_conv=3)

# Stacked dilated branches
m2 = TFSTCNMamba(d_model=512, d_conv=3, conv_dilations=(1, 2, 4))

# Depthwise + pointwise
m3 = TFDPWCMamba(d_model=512, d_conv=3, conv_dilations=(1, 2, 4, 8))

API quick reference

Mamba(d_model, d_state=16, d_conv=4, conv_dilations=(1,), expand=2, dt_rank="auto", dt_min=0.001, dt_max=0.1, dt_init="random", dt_scale=1.0, dt_init_floor=1e-4, conv_bias=True, bias=False, use_fast_path=False, layer_idx=None, device=None, dtype=None)

d_model (int, required): input/output embedding size.
d_state (int, default 16): SSM state dimension per channel. Larger gives longer memory; increases compute.
d_conv (int, default 4): depthwise conv kernel size for each branch.
conv_dilations (tuple[int], default (1,)): dilation per branch. Multiple values create parallel dilated convs; effective receptive field is (d_conv-1)*dilation.
expand (float, default 2): inner width multiplier; sets d_inner = expand * d_model.
dt_rank (int or "auto", default "auto"): rank of delta projection. "auto" sets ceil(d_model/16).
dt_min, dt_max (float, defaults 1e-3 / 1e-1): log-uniform range for delta initialization.
dt_init ("random" | "constant", default "random") and dt_scale, dt_init_floor: control delta init magnitude/stability.
conv_bias (bool, default True): include bias in depthwise convs.
bias (bool, default False): include bias in input/output linear projections.
use_fast_path (bool): ignored in this pure-PyTorch build; kept for API compatibility.
layer_idx (int | None): identifier for streaming cache registration; required when using allocate_inference_cache + inference_params.
device, dtype: standard module factory kwargs.

Inference / streaming helpers

allocate_inference_cache(batch_size, max_seqlen, dtype=None): preallocates conv and SSM state buffers for step-wise decoding.
step(hidden_states, conv_state, ssm_state): single-token forward (expects hidden_states with shape (B, 1, d_model)).
forward(..., inference_params): if inference_params has cached states (with key_value_memory_dict and seqlen_offset), uses them for streaming.

Highlights

Dual Framework: PyTorch and TensorFlow implementations with identical core logic
Multi-branch Architecture: Parallel or stacked causal dilated convolutions with learned gating
Pure Python: No custom C++/CUDA or Triton kernels required
Streaming Support: Per-branch conv states and SSM state caching for autoregressive generation
Framework Parity: Mathematically equivalent implementations verified across both frameworks

Practical setups

Local modeling / small context: d_conv=3, conv_dilations=(1,2,4), d_state=8–16, expand=2.
Longer context: widen conv_dilations (e.g., (1,2,4,8,16)) or increase d_state to 32; expect higher memory/compute.
Streaming/AR decoding: call allocate_inference_cache once per layer, pass inference_params during forward; use step inside your generation loop.
Stability first: keep dt_min >= 1e-4 and dt_init_floor small; leave defaults unless you observe drift or exploding activations.

Framework Compatibility

PyTorch vs TensorFlow

Both implementations are mathematically equivalent and follow the same core Mamba architecture:

Tensor Layout: PyTorch uses (B, D, L) (channels-first), TensorFlow uses (B, L, D) (channels-last)
SSM Formulation: Identical selective state space model with zero-order hold discretization
Initialization: Same parameter initialization schemes (A matrix, D parameter, dt bias)
Variants: All architectural variants available in both frameworks

The implementations have been thoroughly tested to ensure numerical equivalence within floating-point precision.

Notes

Set different conv_dilations to adjust receptive field; keep kernels small (e.g., 3–5) to avoid excessive padding.
use_fast_path flag is ignored here (kept for API compatibility with reference implementations).
Reference selective scan is implemented in pure Python for portability; faster fused kernels are omitted intentionally.
TensorFlow implementation uses custom depthwise convolution via tf.nn.depthwise_conv2d for dilated causal convolutions.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.1

Feb 20, 2026

1.1.0

Feb 20, 2026

1.0.2

Feb 20, 2026

1.0.1

Feb 20, 2026

0.2.6

Feb 20, 2026

This version

0.2.5

Feb 17, 2026

0.2.4

Feb 14, 2026

0.2.3

Feb 12, 2026

0.2.2

Feb 12, 2026

0.2.1

Feb 12, 2026

0.2.0

Feb 7, 2026

0.1.9

Feb 7, 2026

0.1.8

Jan 30, 2026

0.1.7

Jan 30, 2026

0.1.5

Jan 30, 2026

0.1.4

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lite_mamba-0.2.5.tar.gz (17.1 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lite_mamba-0.2.5-py3-none-any.whl (14.8 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file lite_mamba-0.2.5.tar.gz.

File metadata

Download URL: lite_mamba-0.2.5.tar.gz
Upload date: Feb 17, 2026
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for lite_mamba-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`d8cdf7c1ba389ee7ea01bfab1c9aefcf8ee38ed68a6036e83cb6361f4a398a0b`
MD5	`ffeaad08301557408d4a39ddbd55ce2f`
BLAKE2b-256	`96368448fd171006ba6b28fce07fb720d9319614427c204606a130d1307500ad`

See more details on using hashes here.

File details

Details for the file lite_mamba-0.2.5-py3-none-any.whl.

File metadata

Download URL: lite_mamba-0.2.5-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for lite_mamba-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9099c83e5c73f18a1316d77629a977f474e777f3d7aabbd61991d42b70c2441d`
MD5	`7dc9646e34fdca8327c8049ca56909bf`
BLAKE2b-256	`03e435ab4900f0826f59dd8097bf1fb2aa1a4695cf31ff58a5bc6e9420b2768a`

See more details on using hashes here.

lite-mamba 0.2.5

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

lite-mamba

Install

Usage

PyTorch

TensorFlow

Conv front-end variants

Baseline helper

TensorFlow variants

API quick reference

Inference / streaming helpers

Highlights

Practical setups

Framework Compatibility

PyTorch vs TensorFlow

Notes

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes