Skip to main content

A novel method for unsupervised patient stratification.

Project description

UnPaSt

Python Versions Tests status PyPI version Docker Build Status Docker Image Pulls License

UnPaSt is a novel method for identification of differentially expressed biclusters.

Cite

UnPaSt preprint: https://arxiv.org/abs/2408.00200.

Code: https://github.com/ozolotareva/unpast_paper/

Quick Start

Using UnPaSt online

Run UnPaSt at CoSy.Bio server

Local installation

UnPaSt is available on PyPI and can be installed using pip

pip install unpast

wget https://github.com/ozolotareva/unpast/raw/refs/heads/main/unpast/tests/test_input/synthetic_clear_biclusters.tsv
unpast --exprs synthetic_clear_biclusters.tsv

To use --clustering WGCNA method instead of default one, you would also need to install the necessary R packages (see Requirements below).

Running in Docker

UnPaSt is also available as a Docker image, preinstalled R packages included. To pull the Docker image:

# load image and example data
docker pull freddsle/unpast
wget https://github.com/ozolotareva/unpast/raw/refs/heads/main/unpast/tests/test_input/synthetic_clear_biclusters.tsv

# run UnPaSt in a Docker environment with current directory and user
docker run --rm -it -u $(id -u):$(id -g) -v "$(pwd)":/data \
  freddsle/unpast \
    --exprs /data/synthetic_clear_biclusters.tsv \
    --out_dir /data/results/synthetic_clear_biclusters 

To use some previous docker version, replace freddsle/unpast with freddsle/unpast:<version> with a specific version tag, see available tags here.

Development setup

Developer mode allows you to run modified UnPaSt code. This is useful for local updates or contributing to the project.

Docker development environment

To run UnPaSt in a Docker container with the latest code from the repository, you can use the following command:

# Clone the repository to get code
git clone https://github.com/ozolotareva/unpast.git
cd unpast

# Define the command to run UnPaSt 
# using unpast.run_unpast to surpass pre-insalled version from the Docker image
command="python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --verbose"

# Run UnPaSt using Docker
docker run --rm -it -u $(id -u):$(id -g) -v "$(pwd)":/data --entrypoint bash freddsle/unpast -c "cd /data && $command"

Requirements

UnPaSt requires Python 3.9-3.11 and certain Python and R packages.

Python and R dependencies

Python Dependencies

The Python dependencies are installed automatically when installing via pip (see pyproject.toml).

They include (with recommended versions):

fisher = ">=0.1.9,<=0.1.14"
pandas = "1.3.5"
python-louvain = "0.15"
matplotlib = "3.7.1"
seaborn = "0.11.1"
numba = ">=0.51.2,<=0.55.2"
numpy = "1.22.3"
scikit-learn = "1.2.2"
scikit-network = ">=0.24.0,<0.26.0"
scipy = ">=1.7.1,<=1.7.3"
statsmodels = "0.13.2"
kneed = "0.8.1"

R Dependencies

For the WGCNA clustering method, UnPaSt requires R and specific R packages.

UnPaSt utilizes R packages for certain analyses. Ensure that you have R installed with the following packages:

  • WGCNA (version 1.70-3 or higher)
  • limma (version 3.42.2 or higher)

Installing R

Ensure that R (version 4.3.1 or higher) is installed on your system. You can download R from CRAN.

It is recommended to use BiocManager for installing R packages:

install.packages("BiocManager")
BiocManager::install("WGCNA")
BiocManager::install("limma")

API Reference

Input

UnPaSt requires a tab-separated file with features (e.g. genes) in rows, and samples in columns.

  • Feature and sample names must be unique.
  • At least 2 features and 5 samples are required.
  • Data must be between-sample normalized.

Recommendations:

  • It is recommended that UnPaSt be applied to datasets with 20+ samples.
  • If the cohort is not large (<20 samples), reducing the minimal number of samples in a bicluster (min_n_samples) to 2 is recommended.
  • If the number of features is small, using the Louvain method for feature clustering instead of WGCNA and/or disabling feature selection by setting the binarization p-value (p-val) to 1 might be helpful.

Examples

  • Simulated data example: Biclustering of a matrix with 10 000 rows (features) and 200 columns (samples) with four implanted biclusters consisting of 500 features and 10-100 samples each. For more details, see Figure 3 and Methods here.
mkdir -p results;

# running UnPaSt with default parameters and example data
unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500

# with different binarization and clustering methods
unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --binarization ward --clustering Louvain

# help
unpast -h
  • Real data example. Analysis of a subset of 200 samples randomly chosen from TCGA-BRCA dataset, including consensus biclustering and visualization: jupyter-notebook.

Outputs

The program creates a folder runs/run_<timestamp>/ with the results of UnPaSt run, where <timestamp> is the date and time of the run in the format YYYYMMDDTHHMMSS.

The folder contains the files

run_YYYYMMDDTHHMMSS
├── args.tsv
├── biclusters.tsv 
└── unpast.log

The file biclusters.tsv contains the identified biclusters, with one bicluster per line. The format of this file is as follows:

    • the first line starts with #, storing the parameters of UnPaSt
    • the second line contains the column headers.
    • each subsequent line represents a bicluster with the following columns:
    • SNR: Signal-to-noise ratio of the bicluster, calculated as the average SNR of its features.
    • n_genes: Number of genes in the bicluster.
    • n_samples: Number of samples in the bicluster.
    • genes: Space-separated list of gene names.
    • samples: Space-separated list of sample names.
    • direction: Indicates whether the bicluster consists of up-regulated ("UP"), down-regulated ("DOWN"), or both types of genes ("BOTH").
    • genes_up, genes_down: Space-separated lists of up- and down-resulated genes respectively.
    • gene_indexes: 0-based index of the genes in the input matrix.
    • sample_indexes: 0-based index of the samples in the input matrix.

The files args.tsv and unpast.log contain the parameters used for the run and the log of the run respectively.

Along with the biclustering result, if save mode is used, UnPaSt saves the intermediate results of feature binarization. The files are stored in the binarization/ subfolder and include:

binarization
├── bin_args.tsv
├── bin_background.tsv
├── bin_res.tsv
└── bin_stats.tsv

with the files:

  • bin_args.tsv contains the subset of parameters used for binarization.
  • bin_background.tsv contains background distributions of SNR values for each evaluated bicluster size.
  • bin_res.tsv contains binarized input data.
  • bin_stats.tsv provides binarization statistics for each processed feature.

The binarization files can be used to restart UnPaSt with the same input and seed from the feature clustering step and skip time-consuming feature binarization.

Versions

UnPaSt version used in PathoPlex paper: UnPaSt_PathoPlex.zip

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unpast-0.1.11.tar.gz (20.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unpast-0.1.11-py3-none-any.whl (20.7 MB view details)

Uploaded Python 3

File details

Details for the file unpast-0.1.11.tar.gz.

File metadata

  • Download URL: unpast-0.1.11.tar.gz
  • Upload date:
  • Size: 20.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unpast-0.1.11.tar.gz
Algorithm Hash digest
SHA256 1329e57dc61b6f96a36a70a48e497094915c9e84cf35f3703064f8f4741d927d
MD5 6d02848549522496813c89da9dbaa7b0
BLAKE2b-256 6ce5d9f98a23a12b7936ddf9c575bcddcfebb981a3bd678da6606eca1bc06bf5

See more details on using hashes here.

Provenance

The following attestation bundles were made for unpast-0.1.11.tar.gz:

Publisher: publish.yml on ozolotareva/unpast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unpast-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: unpast-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unpast-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 1b881e3589ce7b2ceca865a7868954cc56a00e9a768353b638d11d58089a4b00
MD5 823f672e2de9a2ca5078e6b65a7b50ea
BLAKE2b-256 a5774a369e8399fb0730aff5d84d281106b4b07babcaba2ae9064d32a8087d46

See more details on using hashes here.

Provenance

The following attestation bundles were made for unpast-0.1.11-py3-none-any.whl:

Publisher: publish.yml on ozolotareva/unpast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page