A unified library for autoencoder-family models across deterministic, variational, and quantized latent spaces.
Project description
autoencoders
A latent-model toolkit for deterministic, variational, and quantized autoencoders
Build, train, serialize, and export latent models with one consistent API.
autoencoders is a PyTorch-first library for autoencoder-family models across deterministic, variational, and quantized latent spaces.
The project goal is simple: make autoencoders feel composable, serializable, and reusable in the same way transformers did for sequence models.
Why autoencoders
|
🧩 Unified API One package shape across `AE`, `VAE`, `VQ-VAE`, `PQ-VAE`, `RQ-VAE`, `WAE`, `AAE`, and more. |
🧠 Latent-first design Treat reconstruction, posterior statistics, quantized codes, and exported latents as first-class outputs. |
📦 Reusable checkpoints Use `save_pretrained()` and `from_pretrained()` for stable, shareable model artifacts. |
🚀 Real training flow Ship with trainers, datasets, shell wrappers, and packaging hooks for end-to-end experiments. |
What It Covers
Current model families include:
- Deterministic models:
AE,DAE,CAE,SAE,TopKSAE,KLSAE,WAE,AAE - Variational models:
VAE,DVAE,BetaVAE,BetaTCVAE,DIPVAE,InfoVAE,MMDVAE,FactorVAE,VampPriorVAE,HVAE - Quantized models:
VQVAE,GumbelVQ,FSQ,RFSQ,PQVAE,RQVAE,VQVAE2
Core interfaces include:
Config + Model + Output + Exportsave_pretrained()/from_pretrained()encode()/decode()/reconstruct()/export()- family-specific trainers for deterministic, variational, and quantized models
At a Glance
| Family | Examples | Key outputs |
|---|---|---|
| Deterministic | `AE`, `DAE`, `CAE`, `SAE`, `TopKSAE`, `KLSAE` | `reconstruction`, `latents`, sparse and contractive penalties |
| Variational | `VAE`, `DVAE`, `BetaVAE`, `HVAE` | `posterior_mean`, `posterior_logvar`, `kl_loss`, `free_bits_kl_loss` |
| Quantized | `VQVAE`, `FSQ`, `PQVAE`, `RQVAE` | `quantized_latents`, `codebook_indices`, usage and perplexity metrics |
Installation
Install the package:
pip install autoencoders
Install with PyTorch dependencies:
pip install "autoencoders[torch]"
Install with encoder-backed text dataset support:
pip install "autoencoders[text]"
Install with CLIP-backed multimodal dataset support:
pip install "autoencoders[clip]"
Install everything commonly needed for experiments:
pip install "autoencoders[all]"
If you are working from source and plan to build or publish packages:
pip install "autoencoders[dev]"
Documentation
The repository now ships with an MkDocs site that documents:
- the dataset, backbone, and
DataSpecsurface - the unified YAML training entrypoint
- tree-structured model parameter references for deterministic, variational, and quantized families
Preview the docs locally:
mkdocs serve
Build the static site:
mkdocs build --strict
Quick Start
Build a basic AE + MLP model explicitly from a sample spec:
import torch
from autoencoders import AutoencoderConfig, AutoencoderModel
from autoencoders.data.base import TensorSpec
model = AutoencoderModel(
config=AutoencoderConfig(latent_dim=16),
sample_spec=TensorSpec(shape=(50,)),
encoder="mlp",
encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
decoder="mlp",
decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)
inputs = torch.randn(32, 50)
outputs = model(inputs)
print(outputs.loss)
print(outputs.latents.shape)
print(outputs.reconstruction.shape)
Save and load checkpoints:
model.save_pretrained("artifacts/ae")
restored = AutoencoderModel.from_pretrained("artifacts/ae")
Inspect the model pipeline and layer-by-layer shape trace:
for step in model.get_pipeline_trace():
print(step.name, "->", step.output_spec)
Train from YAML:
python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5
Product Surface
Use the package at three different layers:
Model layer: build or load latent models with typed configsTraining layer: train deterministic, variational, or quantized families with dedicated trainersExperiment layer: run reusable YAML configs with one trainer entrypoint on real datasets
Model Loading
Load a model dynamically by name while still keeping backbone selection explicit:
from autoencoders import load_model
from autoencoders.data.base import TensorSpec
model = load_model(
"vae",
sample_spec=TensorSpec(shape=(50,)),
latent_dim=16,
kl_weight=0.1,
free_bits=0.02,
encoder="mlp",
encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
decoder="mlp",
decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)
Datasets
The library currently ships with embedding-first datasets plus one image dataset for CNN- and ViT-backed experiments:
glovefasttextnumberbatchsnlimultinliflickr30kcifar10
Load a dataset directly:
from autoencoders import load_dataset
dataset = load_dataset("glove", dim=50, max_vectors=50000)
loaders = dataset.get_dataloaders(batch_size=256)
Encoder-backed sentence datasets materialize embeddings during prepare() and cache the result just like static embedding tables:
dataset = load_dataset(
"snli",
encoder_name="sentence-transformers/all-MiniLM-L6-v2",
max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)
CLIP-backed multimodal datasets follow the same cached artifact pattern:
dataset = load_dataset(
"flickr30k",
encoder_name="ViT-B-32",
encoder_pretrained="laion2b_s34b_b79k",
modality="both",
max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)
Image data uses H x W x C specs end to end:
dataset = load_dataset("cifar10", max_examples=10000)
print(dataset.get_sample_spec()) # TensorSpec(shape=(32, 32, 3))
Backbone Semantics
Backbones are configured explicitly and built from the dataset-driven sample_spec.
MLPModuleconsumes tensor specs whose last dimension is the feature width.CNNModuleconsumes image-likeTensorSpec(shape=(H, W, C))values and handlesHWC <-> NCHWconversion internally.VisionTransformerModulealso consumes image-likeTensorSpec(shape=(H, W, C)), patchifies them internally, and exposes sequence-shaped latent specs.
Auto-inferred decoders are intentionally strict:
decoder: nullis supported only when reversing the encoder produces a decoder whose runtime input spec matches the model's decoder input spec.- Models whose decoder space differs from encoder output space, such as hierarchical or latent-shape-changing variants, must provide an explicit decoder config.
For explicit image decoders, use transpose: true when you want an upsampling transposed-convolution stack:
decoder:
name: cnn
config:
channels: [64, 3]
kernel_sizes: [4, 4]
strides: [2, 2]
paddings: [1, 1]
activation: relu
use_bias: true
transpose: true
Downloaded datasets use a global cache:
- default:
~/.cache/autoencoders - override with:
AUTOENCODERS_CACHE=/your/cache/path
This makes the package useful both as:
- a standalone training library
- a latent-model subsystem inside larger PyTorch projects
Training API
Deterministic training:
from autoencoders import AETrainer, TrainingConfig
trainer = AETrainer(
model=model,
args=TrainingConfig(
output_dir="artifacts/ae-run",
epochs=5,
batch_size=256,
),
)
trainer.fit(loaders, metadata={"dataset": "glove", "model": "ae"})
Variational training:
from autoencoders import VAETrainer, VariationalAutoencoderConfig, VariationalAutoencoderModel
from autoencoders.data.base import TensorSpec
trainer = VAETrainer(
model=VariationalAutoencoderModel(
config=VariationalAutoencoderConfig(
latent_dim=16,
kl_weight=0.1,
free_bits=0.02,
kl_warmup_epochs=20,
),
sample_spec=TensorSpec(shape=(50,)),
encoder="mlp",
encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
decoder="mlp",
decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
),
args=TrainingConfig(output_dir="artifacts/vae-run", epochs=10),
)
Quantized training:
from autoencoders import VQTrainer, TrainingConfig, load_model
from autoencoders.data.base import TensorSpec
trainer = VQTrainer(
model=load_model(
"rqvae",
sample_spec=TensorSpec(shape=(None, 50)),
latent_dim=16,
codebook_size=256,
num_quantizers=4,
use_ema_codebook=True,
dead_code_reset=True,
encoder="mlp",
encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
decoder="mlp",
decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
),
args=TrainingConfig(output_dir="artifacts/rqvae-run", epochs=10),
)
Training Entry Point
Source checkouts now use one unified YAML-driven entrypoint:
examples/trainer.py
The legacy examples/train_ae.py wrapper still forwards into the same code path for basic AE runs.
Useful examples:
python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae_vit.yaml --epoch 5
Each config is organized into five sections:
datasetmodelencoderdecodertrainer
Each section uses name + config form except trainer, which is a flat config block. Runtime overrides such as --epoch 5, --lr 0.001, or --max_vectors 5000 resolve into ${...:default}$ placeholders inside the YAML files before training starts.
Launch-Ready Features
🗃️ Checkpoints:save_pretrained()andfrom_pretrained()📤 Exports: standardized latent artifact export across model families📚 Real datasets: static embedding tables, sentence corpora, and CLIP-backed image-text corpora🎛️ Family-specific trainers: deterministic, variational, quantized, and adversarial flows🧪 Packaging: buildablesdistand wheel, ready for PyPI publication
Design Direction
The library is organized around latent model families rather than a single monolithic interface:
BaseAutoencoderModelBaseVariationalAutoencoderModelBaseVectorQuantizedAutoencoderModel
Matching outputs are also family-specific:
BaseAutoencoderOutputVariationalAutoencoderOutputQuantizedAutoencoderOutput
This keeps the shared API stable without flattening away meaningful model differences such as posterior statistics or codebook indices.
Current Scope
autoencoders is intentionally embedding-first, with a growing image path for CNN-backed quantized models. The current core is aimed at:
- representation learning on embedding matrices
- latent compression
- variational latent modeling
- quantized latent tokenization
Future raw-modality frontends and multimodal adapters can be layered on top of this core.
Repository Status
This project is still early, but the current package already supports:
- trainable deterministic, variational, and quantized autoencoder families
- reusable checkpoints
- exportable latent artifacts
- real embedding datasets with download and cache support
- package metadata and distribution artifacts ready for publication workflows
Development
Build the package locally:
python -m build
Check the generated distribution:
twine check dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoencoders-0.5.0.tar.gz.
File metadata
- Download URL: autoencoders-0.5.0.tar.gz
- Upload date:
- Size: 132.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
930968698339af88d919a29ee0617d8150dbc72cfdb7e67a7957785f60027d7a
|
|
| MD5 |
1131bd64c39eab9b9e6346f9c10e82df
|
|
| BLAKE2b-256 |
e9935e939ea28c02492c81b8e2ea0837e6f501558bea04008354d800fa83f1c1
|
File details
Details for the file autoencoders-0.5.0-py3-none-any.whl.
File metadata
- Download URL: autoencoders-0.5.0-py3-none-any.whl
- Upload date:
- Size: 129.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
967efee3824eb98c412cace022fea84092fe723b9b687aeb2c027134a32653d0
|
|
| MD5 |
0990adca7582848022d402639e71fdac
|
|
| BLAKE2b-256 |
2a6590d28a0bf419c748ebfd7f26028d6dcfc9bebe3e16e76e96853b71047e5b
|