Skip to main content

A unified library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

Project description

autoencoders

A latent-model toolkit for deterministic, variational, and quantized autoencoders

Python 3.10+ PyTorch 20+ model families Datasets Checkpoint API

Build, train, serialize, and export latent models with one consistent API.

autoencoders is a PyTorch-first library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

The project goal is simple: make autoencoders feel composable, serializable, and reusable in the same way transformers did for sequence models.

Why autoencoders

🧩 Unified API
One package shape across `AE`, `VAE`, `VQ-VAE`, `PQ-VAE`, `RQ-VAE`, `WAE`, `AAE`, and more.
🧠 Latent-first design
Treat reconstruction, posterior statistics, quantized codes, and exported latents as first-class outputs.
📦 Reusable checkpoints
Use `save_pretrained()` and `from_pretrained()` for stable, shareable model artifacts.
🚀 Real training flow
Ship with trainers, datasets, shell wrappers, and packaging hooks for end-to-end experiments.

What It Covers

Current model families include:

  • Deterministic models: AE, DAE, CAE, SAE, TopKSAE, KLSAE, WAE, AAE
  • Variational models: VAE, DVAE, BetaVAE, BetaTCVAE, DIPVAE, InfoVAE, MMDVAE, FactorVAE, VampPriorVAE, HVAE
  • Quantized models: VQVAE, GumbelVQ, FSQ, RFSQ, PQVAE, RQVAE, VQVAE2

Core interfaces include:

  • Config + Model + Output + Export
  • save_pretrained() / from_pretrained()
  • encode() / decode() / reconstruct() / export()
  • family-specific trainers for deterministic, variational, and quantized models

At a Glance

Family Examples Key outputs
Deterministic `AE`, `DAE`, `CAE`, `SAE`, `TopKSAE`, `KLSAE` `reconstruction`, `latents`, sparse and contractive penalties
Variational `VAE`, `DVAE`, `BetaVAE`, `HVAE` `posterior_mean`, `posterior_logvar`, `kl_loss`, `free_bits_kl_loss`
Quantized `VQVAE`, `FSQ`, `PQVAE`, `RQVAE` `quantized_latents`, `codebook_indices`, usage and perplexity metrics

Installation

Install the package:

pip install autoencoders

Install with PyTorch dependencies:

pip install "autoencoders[torch]"

Install with encoder-backed text dataset support:

pip install "autoencoders[text]"

Install with CLIP-backed multimodal dataset support:

pip install "autoencoders[clip]"

Install everything commonly needed for experiments:

pip install "autoencoders[all]"

If you are working from source and plan to build or publish packages:

pip install "autoencoders[dev]"

Documentation

The repository now ships with an MkDocs site that documents:

  • the dataset, backbone, and DataSpec surface
  • the unified YAML training entrypoint
  • tree-structured model parameter references for deterministic, variational, and quantized families

Preview the docs locally:

mkdocs serve

Build the static site:

mkdocs build --strict

Quick Start

Build a basic AE + MLP model explicitly from a sample spec:

import torch

from autoencoders import AutoencoderConfig, AutoencoderModel
from autoencoders.data.base import TensorSpec

model = AutoencoderModel(
    config=AutoencoderConfig(latent_dim=16),
    sample_spec=TensorSpec(shape=(50,)),
    encoder="mlp",
    encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
    decoder="mlp",
    decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)

inputs = torch.randn(32, 50)
outputs = model(inputs)

print(outputs.loss)
print(outputs.latents.shape)
print(outputs.reconstruction.shape)

Save and load checkpoints:

model.save_pretrained("artifacts/ae")
restored = AutoencoderModel.from_pretrained("artifacts/ae")

Inspect the model pipeline and layer-by-layer shape trace:

for step in model.get_pipeline_trace():
    print(step.name, "->", step.output_spec)

Train from YAML:

python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5

Product Surface

Use the package at three different layers:

  • Model layer: build or load latent models with typed configs
  • Training layer: train deterministic, variational, or quantized families with dedicated trainers
  • Experiment layer: run reusable YAML configs with one trainer entrypoint on real datasets

Model Loading

Load a model dynamically by name while still keeping backbone selection explicit:

from autoencoders import load_model
from autoencoders.data.base import TensorSpec

model = load_model(
    "vae",
    sample_spec=TensorSpec(shape=(50,)),
    latent_dim=16,
    kl_weight=0.1,
    free_bits=0.02,
    encoder="mlp",
    encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
    decoder="mlp",
    decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)

Datasets

The library currently ships with embedding-first datasets plus one image dataset for CNN- and ViT-backed experiments:

  • glove
  • fasttext
  • numberbatch
  • snli
  • multinli
  • flickr30k
  • cifar10

Load a dataset directly:

from autoencoders import load_dataset

dataset = load_dataset("glove", dim=50, max_vectors=50000)
loaders = dataset.get_dataloaders(batch_size=256)

Encoder-backed sentence datasets materialize embeddings during prepare() and cache the result just like static embedding tables:

dataset = load_dataset(
    "snli",
    encoder_name="sentence-transformers/all-MiniLM-L6-v2",
    max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)

CLIP-backed multimodal datasets follow the same cached artifact pattern:

dataset = load_dataset(
    "flickr30k",
    encoder_name="ViT-B-32",
    encoder_pretrained="laion2b_s34b_b79k",
    modality="both",
    max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)

Image data uses H x W x C specs end to end:

dataset = load_dataset("cifar10", max_examples=10000)
print(dataset.get_sample_spec())  # TensorSpec(shape=(32, 32, 3))

Backbone Semantics

Backbones are configured explicitly and built from the dataset-driven sample_spec.

  • MLPModule consumes tensor specs whose last dimension is the feature width.
  • CNNModule consumes image-like TensorSpec(shape=(H, W, C)) values and handles HWC <-> NCHW conversion internally.
  • VisionTransformerModule also consumes image-like TensorSpec(shape=(H, W, C)), patchifies them internally, and exposes sequence-shaped latent specs.

Auto-inferred decoders are intentionally strict:

  • decoder: null is supported only when reversing the encoder produces a decoder whose runtime input spec matches the model's decoder input spec.
  • Models whose decoder space differs from encoder output space, such as hierarchical or latent-shape-changing variants, must provide an explicit decoder config.

For explicit image decoders, use transpose: true when you want an upsampling transposed-convolution stack:

decoder:
  name: cnn
  config:
    channels: [64, 3]
    kernel_sizes: [4, 4]
    strides: [2, 2]
    paddings: [1, 1]
    activation: relu
    use_bias: true
    transpose: true

Downloaded datasets use a global cache:

  • default: ~/.cache/autoencoders
  • override with: AUTOENCODERS_CACHE=/your/cache/path

This makes the package useful both as:

  • a standalone training library
  • a latent-model subsystem inside larger PyTorch projects

Training API

Deterministic training:

from autoencoders import AETrainer, TrainingConfig

trainer = AETrainer(
    model=model,
    args=TrainingConfig(
        output_dir="artifacts/ae-run",
        epochs=5,
        batch_size=256,
    ),
)

trainer.fit(loaders, metadata={"dataset": "glove", "model": "ae"})

Variational training:

from autoencoders import VAETrainer, VariationalAutoencoderConfig, VariationalAutoencoderModel
from autoencoders.data.base import TensorSpec

trainer = VAETrainer(
    model=VariationalAutoencoderModel(
        config=VariationalAutoencoderConfig(
            latent_dim=16,
            kl_weight=0.1,
            free_bits=0.02,
            kl_warmup_epochs=20,
        ),
        sample_spec=TensorSpec(shape=(50,)),
        encoder="mlp",
        encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
        decoder="mlp",
        decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
    ),
    args=TrainingConfig(output_dir="artifacts/vae-run", epochs=10),
)

Quantized training:

from autoencoders import VQTrainer, TrainingConfig, load_model
from autoencoders.data.base import TensorSpec

trainer = VQTrainer(
    model=load_model(
        "rqvae",
        sample_spec=TensorSpec(shape=(None, 50)),
        latent_dim=16,
        codebook_size=256,
        num_quantizers=4,
        use_ema_codebook=True,
        dead_code_reset=True,
        encoder="mlp",
        encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
        decoder="mlp",
        decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
    ),
    args=TrainingConfig(output_dir="artifacts/rqvae-run", epochs=10),
)

Training Entry Point

Source checkouts now use one unified YAML-driven entrypoint:

  • examples/trainer.py

The legacy examples/train_ae.py wrapper still forwards into the same code path for basic AE runs.

Useful examples:

python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae_vit.yaml --epoch 5

Each config is organized into five sections:

  • dataset
  • model
  • encoder
  • decoder
  • trainer

Each section uses name + config form except trainer, which is a flat config block. Runtime overrides such as --epoch 5, --lr 0.001, or --max_vectors 5000 resolve into ${...:default}$ placeholders inside the YAML files before training starts.

Launch-Ready Features

  • 🗃️ Checkpoints: save_pretrained() and from_pretrained()
  • 📤 Exports: standardized latent artifact export across model families
  • 📚 Real datasets: static embedding tables, sentence corpora, and CLIP-backed image-text corpora
  • 🎛️ Family-specific trainers: deterministic, variational, quantized, and adversarial flows
  • 🧪 Packaging: buildable sdist and wheel, ready for PyPI publication

Design Direction

The library is organized around latent model families rather than a single monolithic interface:

  • BaseAutoencoderModel
  • BaseVariationalAutoencoderModel
  • BaseVectorQuantizedAutoencoderModel

Matching outputs are also family-specific:

  • BaseAutoencoderOutput
  • VariationalAutoencoderOutput
  • QuantizedAutoencoderOutput

This keeps the shared API stable without flattening away meaningful model differences such as posterior statistics or codebook indices.

Current Scope

autoencoders is intentionally embedding-first, with a growing image path for CNN-backed quantized models. The current core is aimed at:

  • representation learning on embedding matrices
  • latent compression
  • variational latent modeling
  • quantized latent tokenization

Future raw-modality frontends and multimodal adapters can be layered on top of this core.

Repository Status

This project is still early, but the current package already supports:

  • trainable deterministic, variational, and quantized autoencoder families
  • reusable checkpoints
  • exportable latent artifacts
  • real embedding datasets with download and cache support
  • package metadata and distribution artifacts ready for publication workflows

Development

Build the package locally:

python -m build

Check the generated distribution:

twine check dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoencoders-0.6.4.tar.gz (135.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoencoders-0.6.4-py3-none-any.whl (131.4 kB view details)

Uploaded Python 3

File details

Details for the file autoencoders-0.6.4.tar.gz.

File metadata

  • Download URL: autoencoders-0.6.4.tar.gz
  • Upload date:
  • Size: 135.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.6.4.tar.gz
Algorithm Hash digest
SHA256 f49b5290340edb8805ae65597e1a218090043f4b93033f48b3dcb8e4bb9cb8f6
MD5 db4ebe55fa7d6749dec7cd50e9160749
BLAKE2b-256 5a857d337481db39d9cfb4d915575a9b84665ff0f7332f2432374dfdf414be71

See more details on using hashes here.

File details

Details for the file autoencoders-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: autoencoders-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 131.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e79f66a41b4ec0e7df4d0e8e54fc595eeed97f2dd0d9c63376fc1e606f82c7e3
MD5 70714688c63d8536a552b740ee437822
BLAKE2b-256 fac200b01a2853baa11918752a415918d9c2ce5a9e0fd6b9c989bc4c406de79d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page