Skip to main content

A unified library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

Project description

autoencoders

A latent-model toolkit for deterministic, variational, and quantized autoencoders

Python 3.10+ PyTorch 20+ model families Datasets Checkpoint API

Build, train, serialize, and export latent models with one consistent API.

autoencoders is a PyTorch-first library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

The project goal is simple: make autoencoders feel composable, serializable, and reusable in the same way transformers did for sequence models.

Why autoencoders

🧩 Unified API
One package shape across `AE`, `VAE`, `VQ-VAE`, `PQ-VAE`, `RQ-VAE`, `WAE`, `AAE`, and more.
🧠 Latent-first design
Treat reconstruction, posterior statistics, quantized codes, and exported latents as first-class outputs.
📦 Reusable checkpoints
Use `save_pretrained()` and `from_pretrained()` for stable, shareable model artifacts.
🚀 Real training flow
Ship with trainers, datasets, shell wrappers, and packaging hooks for end-to-end experiments.

What It Covers

Current model families include:

  • Deterministic models: AE, DAE, CAE, SAE, TopKSAE, KLSAE, WAE, AAE
  • Variational models: VAE, DVAE, BetaVAE, BetaTCVAE, DIPVAE, InfoVAE, MMDVAE, FactorVAE, VampPriorVAE, HVAE
  • Quantized models: VQVAE, GumbelVQ, FSQ, RFSQ, PQVAE, RQVAE, VQVAE2

Core interfaces include:

  • Config + Model + Output + Export
  • save_pretrained() / from_pretrained()
  • encode() / decode() / reconstruct() / export()
  • family-specific trainers for deterministic, variational, and quantized models

At a Glance

Family Examples Key outputs
Deterministic `AE`, `DAE`, `CAE`, `SAE`, `TopKSAE`, `KLSAE` `reconstruction`, `latents`, sparse and contractive penalties
Variational `VAE`, `DVAE`, `BetaVAE`, `HVAE` `posterior_mean`, `posterior_logvar`, `kl_loss`, `free_bits_kl_loss`
Quantized `VQVAE`, `FSQ`, `PQVAE`, `RQVAE` `quantized_latents`, `codebook_indices`, usage and perplexity metrics

Installation

Install the package:

pip install autoencoders

Install with PyTorch dependencies:

pip install "autoencoders[torch]"

Install with encoder-backed text dataset support:

pip install "autoencoders[text]"

Install with CLIP-backed multimodal dataset support:

pip install "autoencoders[clip]"

Install everything commonly needed for experiments:

pip install "autoencoders[all]"

If you are working from source and plan to build or publish packages:

pip install "autoencoders[dev]"

Documentation

The repository now ships with an MkDocs site that documents:

  • the dataset, backbone, and DataSpec surface
  • the unified YAML training entrypoint
  • tree-structured model parameter references for deterministic, variational, and quantized families

Preview the docs locally:

mkdocs serve

Build the static site:

mkdocs build --strict

Quick Start

Build a basic AE + MLP model explicitly from a sample spec:

import torch

from autoencoders import AutoencoderConfig, AutoencoderModel
from autoencoders.data.base import TensorSpec

model = AutoencoderModel(
    config=AutoencoderConfig(latent_dim=16),
    sample_spec=TensorSpec(shape=(50,)),
    encoder="mlp",
    encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
    decoder="mlp",
    decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)

inputs = torch.randn(32, 50)
outputs = model(inputs)

print(outputs.loss)
print(outputs.latents.shape)
print(outputs.reconstruction.shape)

Save and load checkpoints:

model.save_pretrained("artifacts/ae")
restored = AutoencoderModel.from_pretrained("artifacts/ae")

Inspect the model pipeline and layer-by-layer shape trace:

for step in model.get_pipeline_trace():
    print(step.name, "->", step.output_spec)

Train from YAML:

python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5

Product Surface

Use the package at three different layers:

  • Model layer: build or load latent models with typed configs
  • Training layer: train deterministic, variational, or quantized families with dedicated trainers
  • Experiment layer: run reusable YAML configs with one trainer entrypoint on real datasets

Model Loading

Load a model dynamically by name while still keeping backbone selection explicit:

from autoencoders import load_model
from autoencoders.data.base import TensorSpec

model = load_model(
    "vae",
    sample_spec=TensorSpec(shape=(50,)),
    latent_dim=16,
    kl_weight=0.1,
    free_bits=0.02,
    encoder="mlp",
    encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
    decoder="mlp",
    decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
)

Datasets

The library currently ships with embedding-first datasets plus one image dataset for CNN- and ViT-backed experiments:

  • glove
  • fasttext
  • numberbatch
  • snli
  • multinli
  • flickr30k
  • cifar10

Load a dataset directly:

from autoencoders import load_dataset

dataset = load_dataset("glove", dim=50, max_vectors=50000)
loaders = dataset.get_dataloaders(batch_size=256)

Encoder-backed sentence datasets materialize embeddings during prepare() and cache the result just like static embedding tables:

dataset = load_dataset(
    "snli",
    encoder_name="sentence-transformers/all-MiniLM-L6-v2",
    max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)

CLIP-backed multimodal datasets follow the same cached artifact pattern:

dataset = load_dataset(
    "flickr30k",
    encoder_name="ViT-B-32",
    encoder_pretrained="laion2b_s34b_b79k",
    modality="both",
    max_vectors=50000,
)
loaders = dataset.get_dataloaders(batch_size=256)

Image data uses H x W x C specs end to end:

dataset = load_dataset("cifar10", max_examples=10000)
print(dataset.get_sample_spec())  # TensorSpec(shape=(32, 32, 3))

Backbone Semantics

Backbones are configured explicitly and built from the dataset-driven sample_spec.

  • MLPModule consumes tensor specs whose last dimension is the feature width.
  • CNNModule consumes image-like TensorSpec(shape=(H, W, C)) values and handles HWC <-> NCHW conversion internally.
  • VisionTransformerModule also consumes image-like TensorSpec(shape=(H, W, C)), patchifies them internally, and exposes sequence-shaped latent specs.

Auto-inferred decoders are intentionally strict:

  • decoder: null is supported only when reversing the encoder produces a decoder whose runtime input spec matches the model's decoder input spec.
  • Models whose decoder space differs from encoder output space, such as hierarchical or latent-shape-changing variants, must provide an explicit decoder config.

For explicit image decoders, use transpose: true when you want an upsampling transposed-convolution stack:

decoder:
  name: cnn
  config:
    channels: [64, 3]
    kernel_sizes: [4, 4]
    strides: [2, 2]
    paddings: [1, 1]
    activation: relu
    use_bias: true
    transpose: true

Downloaded datasets use a global cache:

  • default: ~/.cache/autoencoders
  • override with: AUTOENCODERS_CACHE=/your/cache/path

This makes the package useful both as:

  • a standalone training library
  • a latent-model subsystem inside larger PyTorch projects

Training API

Deterministic training:

from autoencoders import AETrainer, TrainingConfig

trainer = AETrainer(
    model=model,
    args=TrainingConfig(
        output_dir="artifacts/ae-run",
        epochs=5,
        batch_size=256,
    ),
)

trainer.fit(loaders, metadata={"dataset": "glove", "model": "ae"})

Variational training:

from autoencoders import VAETrainer, VariationalAutoencoderConfig, VariationalAutoencoderModel
from autoencoders.data.base import TensorSpec

trainer = VAETrainer(
    model=VariationalAutoencoderModel(
        config=VariationalAutoencoderConfig(
            latent_dim=16,
            kl_weight=0.1,
            free_bits=0.02,
            kl_warmup_epochs=20,
        ),
        sample_spec=TensorSpec(shape=(50,)),
        encoder="mlp",
        encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
        decoder="mlp",
        decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
    ),
    args=TrainingConfig(output_dir="artifacts/vae-run", epochs=10),
)

Quantized training:

from autoencoders import VQTrainer, TrainingConfig, load_model
from autoencoders.data.base import TensorSpec

trainer = VQTrainer(
    model=load_model(
        "rqvae",
        sample_spec=TensorSpec(shape=(None, 50)),
        latent_dim=16,
        codebook_size=256,
        num_quantizers=4,
        use_ema_codebook=True,
        dead_code_reset=True,
        encoder="mlp",
        encoder_config={"hidden_dims": [64, 32], "activation": "relu", "use_bias": True},
        decoder="mlp",
        decoder_config={"hidden_dims": [64, 50], "activation": "relu", "use_bias": True},
    ),
    args=TrainingConfig(output_dir="artifacts/rqvae-run", epochs=10),
)

Training Entry Point

Source checkouts now use one unified YAML-driven entrypoint:

  • examples/trainer.py

The legacy examples/train_ae.py wrapper still forwards into the same code path for basic AE runs.

Useful examples:

python examples/trainer.py --config examples/configs/glove/ae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vae.yaml --epoch 5
python examples/trainer.py --config examples/configs/glove/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae.yaml --epoch 5
python examples/trainer.py --config examples/configs/cifar10/vqvae_vit.yaml --epoch 5

Each config is organized into five sections:

  • dataset
  • model
  • encoder
  • decoder
  • trainer

Each section uses name + config form except trainer, which is a flat config block. Runtime overrides such as --epoch 5, --lr 0.001, or --max_vectors 5000 resolve into ${...:default}$ placeholders inside the YAML files before training starts.

Launch-Ready Features

  • 🗃️ Checkpoints: save_pretrained() and from_pretrained()
  • 📤 Exports: standardized latent artifact export across model families
  • 📚 Real datasets: static embedding tables, sentence corpora, and CLIP-backed image-text corpora
  • 🎛️ Family-specific trainers: deterministic, variational, quantized, and adversarial flows
  • 🧪 Packaging: buildable sdist and wheel, ready for PyPI publication

Design Direction

The library is organized around latent model families rather than a single monolithic interface:

  • BaseAutoencoderModel
  • BaseVariationalAutoencoderModel
  • BaseVectorQuantizedAutoencoderModel

Matching outputs are also family-specific:

  • BaseAutoencoderOutput
  • VariationalAutoencoderOutput
  • QuantizedAutoencoderOutput

This keeps the shared API stable without flattening away meaningful model differences such as posterior statistics or codebook indices.

Current Scope

autoencoders is intentionally embedding-first, with a growing image path for CNN-backed quantized models. The current core is aimed at:

  • representation learning on embedding matrices
  • latent compression
  • variational latent modeling
  • quantized latent tokenization

Future raw-modality frontends and multimodal adapters can be layered on top of this core.

Repository Status

This project is still early, but the current package already supports:

  • trainable deterministic, variational, and quantized autoencoder families
  • reusable checkpoints
  • exportable latent artifacts
  • real embedding datasets with download and cache support
  • package metadata and distribution artifacts ready for publication workflows

Development

Build the package locally:

python -m build

Check the generated distribution:

twine check dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoencoders-0.4.1.tar.gz (127.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoencoders-0.4.1-py3-none-any.whl (127.4 kB view details)

Uploaded Python 3

File details

Details for the file autoencoders-0.4.1.tar.gz.

File metadata

  • Download URL: autoencoders-0.4.1.tar.gz
  • Upload date:
  • Size: 127.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.4.1.tar.gz
Algorithm Hash digest
SHA256 dec018dd107e249ac3d4a9260cdc2ca64e5ec1ff19fef214917bf39d2ac77228
MD5 ff4044b99d04b6406a5b1520f889b0f6
BLAKE2b-256 8662b80ba1589bfc90426c23ff6967f0832d58ce745089447386b74bdf0bc997

See more details on using hashes here.

File details

Details for the file autoencoders-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: autoencoders-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 127.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7584dbff5549585c16396206f8da96b9aea00ecffae80e8fb4dd7af66dd3a6af
MD5 3d59dd012098af50ac87221db5967776
BLAKE2b-256 cd1393ec6dc910dbaf4aa2a8f13b17b065d02390c9f210050e89ddb10899479a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page