Skip to main content

Framework for multi-omics data integration by autoencoders.

Project description

AUTOENCODIX

Autoencoders are deep-learning-based networks for dimension reduction and embedding by a combination of a compressing encoder and decoder structure for non-linear and multi-modal data integration, with promising applications to complex biological data from large-scale omics measurements. Current ongoing research and publications provide many exciting architectures and implementations of autoencoders. However, there is a lack of easy-to-use and unified implementations covering the whole pipeline of autoencoder applications. Consequently, we present AUTOENCODIX with the following features:

  • Multi-modal data integration for any numerical or categorical data
  • Different autoencoder architectures:
    • vanilla vanillix
    • variational varix
    • disentangled variational disentanglix
    • hierarchical/stacked stackix
    • ontology-based ontix
    • masking maskix
    • Image VAE (2D) imagix
    • cross-modal autoencoder (translation between different data modalities) x-modalix (works for multiple modalities paired and unpaired)
  • A Python package with a scikit-learn-like interface

Requirements

  • Python>=3.8 <3.13
  • uv or another package manager (we recommend uv)
  • git or gh

Installation

  • gh repo clone jan-forest/autoencodix_package
  • cd autoencodix_package
  • uv venv --python 3.10
  • source .venv/bin/activate
  • uv sync

Sample Usage

import autoencodix as acx
from autoencodix.data.datapackage import DataPackage
from autoencodix.configs.vanillix_config import VanillixConfig
from autoencodix.configs.default_config import DataCase

# If your data is stored in pandas DataFrames, you can easily pass them to our custom DataPackage.
# For any tabular data that is not single-cell, provide it as a dictionary to the "multi_bulk" attribute of DataPackage.
# Note: "multi" might be misleading — it's valid to provide just one modality (1–n data modalities).
# Here, we assume paired metadata. If you have separate metadata for each modality, use the same dict keys as in multi_bulk, e.g.:
# annotation = {"rna": rna_annotation, "protein": protein_annotation}
my_datapackage: DataPackage = DataPackage(
    multi_bulk={"rna": raw_rna, "protein": raw_protein},
    annotation={"paired": annotation},
)

myconfig: VanillixConfig = VanillixConfig(data_case=DataCase.MULTI_BULK, epochs=30, device="cpu")
vanillix = acx.Vanillix(data=my_datapackage, config=myconfig)
result = vanillix.run()

Getting Started

We provide extensive tutorials for all of our use cases. The best place to start is the Vanillix Tutorial. Here, we explain the design and features of our pipeline, which applies to other pipelines. From there, you can explore the tutorials for the more specialized architectures (Varix, Ontix, etc). We also provide tutorials for each pipeline, but the Vanillix Tutorial explains the general concepts, while the other tutorials go into the specifics of the corresponding pipeline. For even more details on extra functionality, such as visualizing or customizing we provide deep dive tutorials for these topics. You can find the tutorials here:

Contributing

Whether you have a feature request, found a bug, or have any other idea, we're always happy. For more details, refer to our Guide

Read The Docs

You can find our documentation here.

FAQ

Reproducibility and CUBLAS_WORKSPACE_CONFIG

If you run for reproducibility with FIX_RANDOMNESS: "all" and you receive the following error:

RuntimeError: Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or `at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. 

You need to run the following in your terminal before running our pipeline:

export CUBLAS_WORKSPACE_CONFIG=:16:8

Cite

While we are working on a new publication of the Python package version, we are referring to our previous publication in Nature Computational Science

Please, use this to cite our work when using our framework:

@article{joas2025autoencodix,
  title={AUTOENCODIX: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond},
  author={Joas, Maximilian Josef and Jurenaite, Neringa and Pra{\v{s}}{\v{c}}evi{\'c}, Du{\v{s}}an and Scherf, Nico and Ewald, Jan},
  journal={Nature Computational Science},
  pages={1--13},
  year={2025},
  doi={}
  publisher={Nature Publishing Group US New York}
}

License

Copyright [2026] [Maximilian Josef Joas & Jan Ewald, ScaDS.AI, Leipzig University]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoencodix-0.2.2.tar.gz (21.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoencodix-0.2.2-py3-none-any.whl (285.4 kB view details)

Uploaded Python 3

File details

Details for the file autoencodix-0.2.2.tar.gz.

File metadata

  • Download URL: autoencodix-0.2.2.tar.gz
  • Upload date:
  • Size: 21.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for autoencodix-0.2.2.tar.gz
Algorithm Hash digest
SHA256 64cc9858cbbab7805bdb55558c2b2616f0876e670c47e1f4056048b8e87c7e49
MD5 3edcef7aef0df6badc0c5cc7a825f24f
BLAKE2b-256 ae127226e4cc63c4b8db09a44ccb5f416a92bacaec2cc9db4ead71d74866dbc2

See more details on using hashes here.

File details

Details for the file autoencodix-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: autoencodix-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 285.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for autoencodix-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c933e228374cc2617cff39ac0fc50f6bd7a66126b0d96d209729cb0bc03c9bc0
MD5 f33512f3fdb790c91ffe1f5736b7cd50
BLAKE2b-256 8c876e181eceb93dae35bdf9fb53afb25d12724386837aebe1f1f7a62f68c3b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page