Vae disentanglement framework built with pytorch lightning.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

🧶 Disent

A modular disentangled representation learning framework for pytorch

Visit the docs for more info, or browse the releases.

Contributions are welcome!

Overview
Getting Started
Features
Examples
- Python Example
- Hydra Config Example
Why?

Overview

Disent is a modular disentangled representation learning framework for auto-encoders, built upon pytorch-lightning. This framework consists of various composable components that can be used to build and benchmark disentanglement pipelines.

The name of the framework is derived from both disentanglement and scientific dissent.

Goals

Disent aims to fill the following criteria:

Provide high quality, readable, consistent and easily comparable implementations of frameworks
Highlight difference between framework implementations by overriding hooks and minimising duplicate code
Use best practice eg. torch.distributions
Be extremely flexible & configurable

Citing Disent

Please use the following citation if you use Disent in your research:

@Misc{Michlo2021Disent,
  author =       {Nathan Juraj Michlo},
  title =        {Disent - A modular disentangled representation learning framework for pytorch},
  howpublished = {Github},
  year =         {2021},
  url =          {https://github.com/nmichlo/disent}
}

Warning ⚠️

Disent is still under active development. Features and APIs are not considered stable, and should be expected to change! A very limited set of tests currently exist which will be expanded upon in time.

Getting Started

The easiest way to use disent is by running experiement/run.py and changing the root config in experiements/config/config.yaml. Configurations are managed with Hydra Config. This mode is only available if you clone the repo directly.

Pypi:

Make sure pip3 is upgraded: pip3 install --upgrade pip
Install disent with: pip3 install disent (for up-to-date versions, rather clone the dev branch)
Visit the docs & examples!

Source:

Clone with: git clone --branch dev https://github.com/nmichlo/disent.git
Change your working directory to the root of the repo: cd disent
Install the requirements for python 3.8 with pip3 install -r requirements.txt
Run the default experiment after configuring experiment/config/config.yaml by running PYTHONPATH=. python3 experiment/run.py

Features

Disent includes implementations of modules, metrics and datasets from various papers. However modules marked with a "🧵" are introduced in disent for my MSc. research.

Frameworks

Unsupervised:
- VAE
- Beta-VAE
- DFC-VAE
- DIP-VAE
- InfoVAE
- BetaTCVAE
Weakly Supervised:
- Ada-GVAE AdaVae(..., average_mode='gvae') Usually better than the Ada-ML-VAE
- Ada-ML-VAE AdaVae(..., average_mode='ml-vae')
Supervised:
- TVAE
Experimental:
- 🧵 Ada-TVAE
  - Adaptive Triplet VAE
- 🧵 DO-TVAE (DO-Ada-TVAE)
  - Data Overlap Adaptive Triplet VAE
- various others not worth mentioning

Many popular disentanglement frameworks still need to be added, please submit an issue if you have a request for an additional framework.

todo

FactorVAE
GroupVAE
MLVAE

Metrics

Disentanglement:
- FactorVAE Score
- DCI
- MIG
- SAP
- Unsupervised Scores
- 🧵 Flatness Score
  - Measures max width (furthest two points) over path length (sum of distances between consecutive points) of factor traversal embeddings. A combined measure of linearity and ordering, (weighted towards axis alignment if l2 width over l1 path length is used).
- 🧵 Flatness Components - Linearity & Axis Alignment
  - Measure linearity, how much the largest eigen vector explains a factor traversal, ie. the largest singular value of latent variables over the sum of singular values.
  - Measure axis-alignment, how much the largest standard basis vector explains a factor traversal, ie. the largest standard deviation of latent variables over the sum of standard deviations.

Some popular metrics still need to be added, please submit an issue if you wish to add your own, or you have a request.

todo

Datasets

Various common datasets used in disentanglement research are implemented, as well as new sythetic datasets that are generated programatically on the fly. These are convenient and lightweight, not requiring storage space.

Ground Truth:
- Cars3D
- dSprites
- MPI3D
- SmallNORB
- Shapes3D
Ground Truth Synthetic:
- 🧵 XYSquares: (non-overlapping) 3 squares (R, G, B) that move across a non-overlapping grid. Obervations have no channel-wise loss overlap.
- 🧵 XYObject: A simplistic version of dSprites with a single square.
- 🧵 XYBlocks: 3 blocks of decreasing size that move across a grid. Blocks can be one of three colors R, G, B. if a smaller block overlaps a larger one and is the same color, the block is xor'd to black.
Input Transforms + Input/Target Augmentations
- Input based transforms are supported.
- Input and Target CPU and GPU based augmentations are supported.

Schedules & Annealing

Hyper-parameter annealing is supported through the use of schedules. The currently implemented schedules include:

Linear Schedule
Cyclic Schedule
Cosine Wave Schedule
Various other wrapper schedules

Architecture

disent

disent/data: raw groundtruth datasets
disent/dataset: dataset wrappers & sampling strategies
disent/framework: frameworks, including Auto-Encoders and VAEs
disent/metrics: metrics for evaluating disentanglement using ground truth datasets
disent/model: common encoder and decoder models used for VAE research
disent/schedule: annealing schedules that can be registered to a framework
disent/transform: transform operations for processing & augmenting input and target data from datasets

experiment

experiment/run.py: entrypoint for running basic experiments with hydra config
experiment/config: root folder for hydra config files
experiment/util: various helper code, pytorch lightning callbacks & visualisation tools for experiments

Examples

Python Example

The following is a basic working example of disent that trains a BetaVAE with a cyclic beta schedule and evaluates the trained model with various metrics.

Basic Example

import pytorch_lightning as pl
from torch.optim import Adam
from torch.utils.data import DataLoader
from disent.data.groundtruth import XYObjectData
from disent.dataset.groundtruth import GroundTruthDataset
from disent.frameworks.vae import BetaVae
from disent.metrics import metric_dci, metric_mig
from disent.model.ae import EncoderConv64, DecoderConv64
from disent.model import AutoEncoder
from disent.nn.transform import ToStandardisedTensor
from disent.schedule import CyclicSchedule

# We use this internally to test this script.
# You can remove all references to this in your own code.
from disent.util import is_test_run

# create the dataset & dataloaders
# - ToStandardisedTensor transforms images from numpy arrays to tensors and performs checks
data = XYObjectData()
dataset = GroundTruthDataset(data, transform=ToStandardisedTensor())
dataloader = DataLoader(dataset=dataset, batch_size=4, shuffle=True)

# create the BetaVAE model
# - adjusting the beta, learning rate, and representation size.
module = BetaVae(
    make_optimizer_fn=lambda params: Adam(params, lr=5e-4),
    make_model_fn=lambda: AutoEncoder(
        # z_multiplier is needed to output mu & logvar when parameterising normal distribution
        encoder=EncoderConv64(x_shape=dataset.x_shape, z_size=6, z_multiplier=2),
        decoder=DecoderConv64(x_shape=dataset.x_shape, z_size=6),
    ),
    cfg=BetaVae.cfg(beta=0.004)
)

# cyclic schedule for target 'beta' in the config/cfg. The initial value from the
# config is saved and multiplied by the ratio from the schedule on each step.
# - based on: https://arxiv.org/abs/1903.10145
module.register_schedule('beta', CyclicSchedule(
    period=1024,  # repeat every: trainer.global_step % period
))

# train model
# - for 65536 batches/steps
trainer = pl.Trainer(logger=False, checkpoint_callback=False, max_steps=65536, fast_dev_run=is_test_run())
trainer.fit(module, dataloader)

# compute disentanglement metrics
# - we cannot guarantee which device the representation is on
# - this will take a while to run
get_repr = lambda x: module.encode(x.to(module.device))

metrics = {
    **metric_dci(dataset, get_repr, num_train=10 if is_test_run() else 1000, num_test=5 if is_test_run() else 500, show_progress=True),
    **metric_mig(dataset, get_repr, num_train=20 if is_test_run() else 2000),
}

# evaluate
print('metrics:', metrics)

Visit the docs for more examples!

Hydra Config Example

The entrypoint for basic experiments is experiments/run.py.

Some configuration will be required, but basic experiments can be adjusted by modifying the Hydra Config 1.0 files in experiment/config.

Modifying the main experiment/config/config.yaml is all you need for most basic experiments. The main config file contains a defaults list with entries corresponding to yaml configuration files (config options) in the subfolders (config groups) in experiment/config/<config_group>/<option>.yaml.

defaults:
  # experiment
  - framework: adavae
  - model: conv64alt
  - optimizer: adam
  - dataset: xysquares
  - augment: none
  - sampling: full_bb
  - metrics: fast
  - schedule: beta_cyclic
  # runtime
  - run_length: long
  - run_location: local
  - run_callbacks: vis
  - run_logging: none

Easily modify any of these values to adjust how the basic experiment will be run. For example, change framework: adavae to framework: betavae, or change the dataset from xysquares to shapes3d.

Weights and Biases is supported by changing run_logging: none to run_logging: wandb. However, you will need to login from the command line.

Why?

Created as part of my Computer Science MSc scheduled for completion in 2021.
I needed custom high quality implementations of various VAE's.
A pytorch version of disentanglement_lib.
I didn't have time to wait for Weakly-Supervised Disentanglement Without Compromises to release their code as part of disentanglement_lib. (As of September 2020 it has been released, but has unresolved discrepencies).
disentanglement_lib still uses outdated Tensorflow 1.0, and the flow of data is unintuitive because of its use of Gin Config.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.8.0

Mar 21, 2023

0.7.2

Mar 21, 2023

0.7.1

Mar 21, 2023

0.7.0

Nov 7, 2022

0.6.3

Nov 3, 2022

0.6.2

Aug 5, 2022

0.6.1

Aug 5, 2022

0.6.0

Jun 9, 2022

0.5.1

Jun 9, 2022

0.5.0

Apr 1, 2022

0.4.0

Mar 31, 2022

0.3.4

Feb 6, 2022

0.3.3

Nov 28, 2021

0.3.2

Nov 22, 2021

0.3.1

Nov 11, 2021

0.3.0

Nov 11, 2021

0.2.1

Oct 4, 2021

0.2.0

Oct 4, 2021

0.1.0

Jul 28, 2021

This version

0.0.1.dev14 pre-release

Jun 4, 2021

0.0.1.dev13 pre-release

May 26, 2021

0.0.1.dev12 pre-release

May 9, 2021

0.0.1.dev11 pre-release

Apr 7, 2021

0.0.1.dev10 pre-release

Apr 7, 2021

0.0.1.dev9 pre-release

Mar 22, 2021

0.0.1.dev8 pre-release

Mar 19, 2021

0.0.1.dev7 pre-release

Mar 19, 2021

0.0.1.dev6 pre-release

Mar 14, 2021

0.0.1.dev5 pre-release

Mar 14, 2021

0.0.1.dev4 pre-release

Feb 27, 2021

0.0.1.dev3 pre-release

Feb 27, 2021

0.0.1.dev2 pre-release

Feb 27, 2021

0.0.1.dev1 pre-release

Feb 5, 2021

0.0.0

Mar 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disent-0.0.1.dev14.tar.gz (132.7 kB view hashes)

Uploaded Jun 4, 2021 Source

Built Distribution

disent-0.0.1.dev14-py3-none-any.whl (250.0 kB view hashes)

Uploaded Jun 4, 2021 Python 3

Hashes for disent-0.0.1.dev14.tar.gz

Hashes for disent-0.0.1.dev14.tar.gz
Algorithm	Hash digest
SHA256	`7ba2c64af2301e946958f4d4be5db8ba42e897af0be8a602a5ce41818d75254e`
MD5	`4983ffee2bfac9185de99fcc6858fcb8`
BLAKE2b-256	`989a159218831f32e92bdd92703dce6698ae727dc2ed138687442b1ebb782f00`

Hashes for disent-0.0.1.dev14-py3-none-any.whl

Hashes for disent-0.0.1.dev14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42a2a6a13e9c73083e818e19cdb1f6e8f1a02bcdfa3bdf93f01459a652ad50e6`
MD5	`f15575432e32e63247a47445b3915b85`
BLAKE2b-256	`fd6338eaecda049b8d148fa31c433044abae7136d08ec2a3639be0869745b4cb`