MyeVAE: a multi-modal variational autoencoder for risk profiling of newly diagnosed multiple myeloma

These details have not been verified by PyPI

Project links

Project description

MyeVAE

A variational autoencoder leveraging multi-omics for risk prediction in newly diagnosed multiple myeloma patients.

Illustration of MyeVAE architecture, hosted on github

This Python package contains the code to train and run SHAP on MyeVAE. No GPU is required.

Install

Install through PyPI (requires Python 3.9 or later)

pip install myevae

Step-by-step guide

1. Download data

Download the raw multi-omics dataset and example outputs from the following link:

Input datasets are stored at https://myevae.s3.us-east-1.amazonaws.com/example_inputs.tar.gz
Example output files are stored at https://myevae.s3.us-east-1.amazonaws.com/example_outputs.tar.gz

Please use the download links sparingly.

If you use the AWS S3 CLI, the bucket ARN is arn:aws:s3:::myevae - you can list its contents and download specific files.

2. (Optional) Feature preprocessing

As there are over 50,000 raw features and contains missing values, this step performs supervised-feature selection, scaling, and imputation.

myevae preprocess \
    -e [endpoint: os | pfs] \
    -i [/inputs/folder]

Processed files will be stored in train_features_[os|pfs]_processed.csv and valid_features_[os|pfs]_processed.csv.

Preprocessing of features is resource-intensive. Alternatively, proceed to step 3 using the processed features in the example input data (https://myevae.s3.us-east-1.amazonaws.com/example_inputs.tar.gz).

train_features_os_processed.csv
train_features_pfs_processed.csv
valid_features_os_processed.csv
valid_features_pfs_processed.csv

3. Fit model

Place the hyperparameter grid (param_grid.py) in /params/folder.

myevae train \
    -e [endpoint: os | pfs] \
    -n [model_name] \
    -i [/inputs/folder] \
    -p [/params/folder] \
    -t [nthreads] \
    -o [/output/folder]

4. Score model

myevae score \
    -e [endpoint: os | pfs] \
    -n [model_name] \
    -i [/inputs/folder] \
    -o [/output/folder]

5. Generate SHAP plots

myevae shap \
    -e [endpoint: os | pfs] \
    -n [model_name] \
    -i [/inputs/folder] \
    -o [/output/folder]

Requirements

Files

Illustration of folder structure, hosted on github

Input csv files

Place the following .csv files inside your inputs folder (/inputs/folder), and the param_grid.py inside your preferred folder. Also create an empty folder for the outputs.

|-----/inputs
|     |----train_features.csv [ feature matrix of training samples ]
|     |----valid_features.csv [ feature matrix of validation samples ]
|     |----train_labels.csv [ survival files of training samples ]
|     |____valid_labels.csv [ survival files of validation samples ]
|-----/params
|     |____param_grid.py
|_____/output
      |____[output files created here]

For features*.csv and labels*.csv, column 0 is read in as the index, which should be the patient IDs.

Example input csv files can be downloaded from AWS S3 link above.

Hyperparameter grid python file

Place a python file named param_grid.py containing the hyperparameter grid dictionary in the params folder (/params/folder). This contains the set of hyperparameters that grid search will be performed on. Otherwise, use the default provided in src/param_grid.py.

The example parameter grid provides 3 options for z_dim (dimension of latent space): 8, 16, or 32:

from torch.nn import LeakyReLU, Tanh

# the default hyperparameter grid for debugging uses
# this is not meant to be used for real training, as the search space is only on z_dim
param_grid = {
    'z_dim': [8,16,32],
    'lr': [5e-4], 
    'batch_size': [1024],
    'input_types': [['exp','cna','gistic','fish','sbs','ig']],
    'input_types_subtask': [['clin']],
    'layer_dims': [[[32, 4],[16,4],[4,1],[4,1],[4,1],[4,1]]],
    'layer_dims_subtask' : [[4,1]],
    'kl_weight': [1],
    'activation': [LeakyReLU()],
    'subtask_activation': [Tanh()],
    'epochs': [100],
    'burn_in': [20],
    'patience': [5],
    'dropout': [0.3],
    'dropout_subtask': [0.3]
}

Software

These dependencies will be automatically installed.

python >= 3.9
torch >= 1.9.0
scikit-learn >= 0.24.1
scikit-survival >= 0.23.1
importlib_resources
matplotlib
shap-0.47.3.dev8-offline-fork-for-myevae==0.0.1

The last dependency is a offline fork of shap (https://pypi.org/project/shap/) which has been modified to work MyeVAE.

Recommended hardware

Minimum 4 CPU cores (8 cores is recommended)
Minimum 16 GB RAM (64GB is required for feature preprocessing)

No GPU is required.

Total CPU time for feature preprocessing: ~24 hours

Total CPU time for training: ~2 minutes per model (no GPU required)

Citation

If you use MyeVAE in your research, please cite the authors:

[Manuscript under review. Contact author at changjiageng@u.nus.edu for citation.]

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.2

May 7, 2025

0.0.1

May 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

myevae-0.0.2.tar.gz (892.6 kB view details)

Uploaded May 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

myevae-0.0.2-py3-none-any.whl (913.5 kB view details)

Uploaded May 7, 2025 Python 3

File details

Details for the file myevae-0.0.2.tar.gz.

File metadata

Download URL: myevae-0.0.2.tar.gz
Upload date: May 7, 2025
Size: 892.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for myevae-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f1871493835216238d4da61e237c97b7d40d6d08546139614622db45d7f3a549`
MD5	`0484fbd5f6d0186e9581ede297e2f488`
BLAKE2b-256	`10ac001fb4c59c9c8d05ab702d8e1b4be99fc1816fe3e513726991a75c29b30e`

See more details on using hashes here.

File details

Details for the file myevae-0.0.2-py3-none-any.whl.

File metadata

Download URL: myevae-0.0.2-py3-none-any.whl
Upload date: May 7, 2025
Size: 913.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for myevae-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`424c7e3c64dfed6e1acfe74e8e9a6f58ea3f09beed474e86f38bb46d8b4c8759`
MD5	`9e10fee4cee61a7db0c7c131a82e558a`
BLAKE2b-256	`2c79e69b27ee6f501f41bc9249e96040e538ea1ba5fc3fec6b94dc898cd2da2f`

See more details on using hashes here.

myevae 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MyeVAE

Install

Step-by-step guide

1. Download data

2. (Optional) Feature preprocessing

3. Fit model

4. Score model

5. Generate SHAP plots

Requirements

Files

Input csv files

Hyperparameter grid python file

Software

Recommended hardware

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes