A unified library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

These details have not been verified by PyPI

Project links

Project description

autoencoders

autoencoders is a PyTorch-first library for autoencoder-family models across deterministic, variational, and quantized latent spaces.

The project goal is simple: make autoencoders feel composable, serializable, and reusable in the same way transformers did for sequence models.

What It Covers

Current model families include:

Deterministic models: AE, DAE, CAE, SAE, TopKSAE, KLSAE, WAE, AAE
Variational models: VAE, DVAE, BetaVAE, HVAE
Quantized models: VQVAE, FSQ, PQVAE, RQVAE

Core interfaces include:

Config + Model + Output + Export
save_pretrained() / from_pretrained()
encode() / decode() / reconstruct() / export()
family-specific trainers for deterministic, variational, and quantized models

Installation

Install the package:

pip install autoencoders

Install with PyTorch dependencies:

pip install "autoencoders[torch]"

If you are working from source and plan to build or publish packages:

pip install "autoencoders[dev]"

Quick Start

Create a basic autoencoder:

from autoencoders import AutoencoderConfig, AutoencoderModel

config = AutoencoderConfig(
    input_dim=50,
    latent_dim=16,
    hidden_dims=[128, 64],
)

model = AutoencoderModel(config)

Run a forward pass:

import torch

inputs = torch.randn(32, 50)
outputs = model(inputs)

print(outputs.loss)
print(outputs.latents.shape)
print(outputs.reconstruction.shape)

Save and load checkpoints:

model.save_pretrained("artifacts/ae")
restored = AutoencoderModel.from_pretrained("artifacts/ae")

Export model artifacts for downstream use:

artifact = model.export(inputs)

print(artifact.latents.shape)
print(artifact.reconstruction.shape)

Model Loading

Load a model dynamically by name:

from autoencoders import load_model

model = load_model(
    "vae",
    input_dim=50,
    latent_dim=16,
    hidden_dims=[128, 64],
    kl_weight=0.1,
    free_bits=0.02,
    kl_warmup_epochs=20,
)

Datasets

The library currently ships with embedding-first datasets:

glove
fasttext
numberbatch

Load a dataset directly:

from autoencoders import load_dataset

dataset = load_dataset("glove", dim=50, max_vectors=50000)
loaders = dataset.get_dataloaders(batch_size=256)

Downloaded datasets use a global cache:

default: ~/.cache/autoencoders
override with: AUTOENCODERS_CACHE=/your/cache/path

Training API

Deterministic training:

from autoencoders import AETrainer, TrainingArguments

trainer = AETrainer(
    model=model,
    args=TrainingArguments(
        output_dir="artifacts/ae-run",
        epochs=5,
        batch_size=256,
    ),
)

trainer.fit(loaders, metadata={"dataset": "glove", "model": "ae"})

Variational training:

from autoencoders import VAETrainer

trainer = VAETrainer(
    model=load_model(
        "vae",
        input_dim=50,
        latent_dim=16,
        hidden_dims=[128, 64],
        kl_weight=0.1,
        free_bits=0.02,
        kl_warmup_epochs=20,
    ),
    args=TrainingArguments(output_dir="artifacts/vae-run", epochs=10),
)

Quantized training:

from autoencoders import VQTrainer

trainer = VQTrainer(
    model=load_model(
        "rqvae",
        input_dim=50,
        latent_dim=16,
        hidden_dims=[128, 64],
        codebook_size=256,
        num_quantizers=4,
        use_ema_codebook=True,
        dead_code_reset=True,
    ),
    args=TrainingArguments(output_dir="artifacts/rqvae-run", epochs=10),
)

Training Scripts

From a source checkout, there are family-specific entrypoints:

examples/train_ae.py
examples/train_vae.py
examples/train_vq.py

There are also convenience shell wrappers for common dataset/model combinations:

scripts/train_glove_*.sh
scripts/train_fasttext_ae.sh
scripts/train_numberbatch_ae.sh

Examples:

bash scripts/train_glove_ae.sh
bash scripts/train_glove_vae.sh
bash scripts/train_glove_rqvae.sh
bash scripts/train_fasttext_ae.sh

Each wrapper includes model-specific defaults and still accepts extra CLI overrides.

Design Direction

The library is organized around latent model families rather than a single monolithic interface:

BaseAutoencoderModel
BaseVariationalAutoencoderModel
BaseVectorQuantizedAutoencoderModel

Matching outputs are also family-specific:

BaseAutoencoderOutput
VariationalAutoencoderOutput
QuantizedAutoencoderOutput

This keeps the shared API stable without flattening away meaningful model differences such as posterior statistics or codebook indices.

Current Scope

autoencoders is intentionally embedding-first right now. The current core is aimed at:

representation learning on embedding matrices
latent compression
variational latent modeling
quantized latent tokenization

Future raw-modality frontends and multimodal adapters can be layered on top of this core.

Repository Status

This project is still early, but the current package already supports:

trainable deterministic, variational, and quantized autoencoder families
reusable checkpoints
exportable latent artifacts
real embedding datasets with download and cache support

Development

Build the package locally:

python -m build

Check the generated distribution:

twine check dist/*

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

May 19, 2026

0.6.1

May 19, 2026

0.6.0

May 19, 2026

0.5.2

May 19, 2026

0.5.1

May 19, 2026

0.5.0

May 19, 2026

0.4.3

May 19, 2026

0.4.2

May 19, 2026

0.4.1

May 18, 2026

0.4.0

May 18, 2026

0.3.0

May 18, 2026

0.2.0

May 17, 2026

This version

0.1.0

May 14, 2026

0.0.1

Oct 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoencoders-0.1.0.tar.gz (51.1 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autoencoders-0.1.0-py3-none-any.whl (69.3 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file autoencoders-0.1.0.tar.gz.

File metadata

Download URL: autoencoders-0.1.0.tar.gz
Upload date: May 14, 2026
Size: 51.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`acb2f5047ef2cd82663325407bcdf28cae51ab02c80f298573c6c2d1f75ffb76`
MD5	`2e35aed998a271612ee79fbc7b431c98`
BLAKE2b-256	`ad848e74c73a18c4e16350dfdc5830c352085241f19a735f8e0de4a614706733`

See more details on using hashes here.

File details

Details for the file autoencoders-0.1.0-py3-none-any.whl.

File metadata

Download URL: autoencoders-0.1.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 69.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for autoencoders-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`712e21fde9ce0a941586b725937b6a78fc978e59478d778083f724c9b39c16e3`
MD5	`685bef3ba8ef7c047248d7bf234133db`
BLAKE2b-256	`1cf1481830b33eb038126e99d7d983cba3b2671b4259e2f133d543e3eafb8d7e`

See more details on using hashes here.

autoencoders 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

autoencoders

What It Covers

Installation

Quick Start

Model Loading

Datasets

Training API

Training Scripts

Design Direction

Current Scope

Repository Status

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes