Skip to main content

Calculate dysbiosis scores using Python

Project description

Run Pytest Coverage Code style: black PyPI version License: CC BY-NC-SA 4.0

Dyspyosis

Python package that can be used to compute dysbiosis scores. The package leverages autoencoders based anomaly detection. Further details on this method are available here.

A gumpy black snake, minimalist illustration

Installation

Before installing dyspyosis, ensure you have the CUDA toolkit v11.x and matching cuDNN installed, these are required for Tensorflow. Which version you need depends on your hardware, e.g. for a GTX 10XX you'll need CUDA Toolkit 11.2 and the matching cuDNN (8.1.1), for more recent cards you can get more recent versions.

Next, install dyspyosis using the command below.

pip install dyspyosis

Usage

Below you can find an example how to use the dyspyosis package. Note that this is for testing purposes and parameters have been set to complete the script quickly. For real data you'll want to increase the rarefication_count (the number of times samples will be rarefied) to a large number (the number of samples x rarefication_count should be > 10k) and increase the number of epochs to 4000.

The encode_dim is the size of the latent space and has been found to work best when set between 4 and 8 depending on the number of genera in the input data, lower encoder_dim values working better with fewer genera.

Note: Depending on your system, you might need to set an environmental variable CUDA_VISIBLE_DEVICES to "0" before loading dyspyosis to use the GPU. Try this in case CUDA is installed, but you get an error that no CUDA device was found.

Note: The neural network dyspyosis is based on is relatively small, depending on the complexity of your dataset and size of the latent space, running dyspyosis on CPU might outperform the GPU (see benchmarks)! To do so, set CUDA_VISIBLE_DEVICES to "-1" and CUDA_DEVICE_ORDER to "PCI_BUS_ID" in your environment before launching dyspyosis.

import pandas as pd
from dyspyosis import Dyspyosis

if __name__ == "__main__":
    df = pd.read_table("./data/test.tsv", index_col=0)

    dyspyosis = Dyspyosis(
        df.values,
        labels=df.index.tolist(),
        rarefication_depth=5000,
        rarefication_count=10,
        encode_dim=4
    )

    dyspyosis.run_training(epochs=5)

    loss = dyspyosis.compute_loss()
    loss.to_csv("./data/loss_out.tsv", sep=",", index=None)

Benchmarks

There are two benchmark scripts included in the repository: benchmark_cpu.py and benchmark_gpu.py. When running the CPU benchmark it is important to set two environmental variables before running the code, CUDA_VISIBLE_DEVICES needs to be "-1" and CUDA_DEVICE_ORDER needs to be "PCI_BUS_ID". This ensures that the CPU benchmark actually runs on the CPU in case a GPU is available.

Here are some results running dyspyosis on hardware we have access to.

Type Hardware Epochs Time (s)
CPU Intel i5-7500 @ 3.4Ghz 100 185.0017
CPU AMD Ryzen 7 3700X 100 115.1882
GPU NVIDIA GeForce GTX 1060 6GB 100 691.4091
GPU NVIDIA GeForce RTX 4080 16GB 100 340.6128

For developers

To create the same environment the main devs are using, use requirements.txt to install the exact versions off all packages.

Clone the repository, create a virtual environment and install all requirements first. Additionally, ensure you have the CUDA toolkit v11.x and matching cuDNN installed, these are required for Tensorflow. Which version you need depends on your hardware, e.g. for a GTX 10XX you'll need CUDA Toolkit 11.2 and the matching cuDNN (8.1.1), for more recent cards you can get more recent versions.

git clone https://github.com/raeslab/dyspyosis
cd dyspyosis
python -m venv venv
source venv/activate
pip install -r requirements.txt

To run tests, use the command below. There are a number of Deprecation Warnings (due to tensorflow) that can be suppressed by --disable-warnings.

pytest tests/ --disable-warnings --cov=src --cov-report=term-missing --cov-report=xml

Contributing

Any contributions you make are greatly appreciated.

  • Found a bug or have some suggestions? Open an issue.
  • Pull requests are welcome! Though open an issue first to discuss which features/changes you wish to implement.

Contact and License

dyspyosis was developed by Sebastian Proost at the RaesLab (part of VIB and KULeuven). dyspyosis is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

For commercial access inquiries, please contact Jeroen Raes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dyspyosis-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

dyspyosis-0.1.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file dyspyosis-0.1.0.tar.gz.

File metadata

  • Download URL: dyspyosis-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.0

File hashes

Hashes for dyspyosis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9696bb3cc39a54aebaff88b4c7cad7a42c16b2dccbcd3cffb44d856a27a1f8ce
MD5 abd1d03024367dd0d87ba38ced93d745
BLAKE2b-256 ea2c1687a05a7191072e3ab05be367064326aaf9398adfad42cd7c282ab07028

See more details on using hashes here.

File details

Details for the file dyspyosis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dyspyosis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.0

File hashes

Hashes for dyspyosis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a400c2485e9b0f4e2a3078877b1efbf1c420cbbfdf27a945cd654ede62b004f
MD5 881e046412b19c23e9f24ac32c1fdc30
BLAKE2b-256 42245112a371d8e73198ef5c8e5fda9c2f7846dc36fa1a077f59c7466a1a8241

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page