Skip to main content

A novel method for unsupervised patient stratification.

Project description

UnPaSt

Python Versions Tests status PyPI version Docker Build Status Docker Image Pulls License

UnPaSt is a novel method for identification of differentially expressed biclusters.

Cite

UnPaSt preprint https://arxiv.org/abs/2408.00200.

Code: https://github.com/ozolotareva/unpast_paper/

Web server

Run UnPaSt at CoSy.Bio server

Install

Install via pip

UnPaSt is available on PyPI and can be installed using pip:

pip install unpast

Do not forget to install necessary R packages (see below).

You can run UnPaSt from the command line using the unpast command.

unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500

Docker Environment

UnPaSt is also available as a Docker image. To pull the Docker Image:

docker pull freddsle/unpast:latest

Replace latest with a specific version tag if desired (for version before 10.2024 - v0.1.8).

Run UnPaSt using Docker

# Clone the repository to get example data
git clone https://github.com/ozolotareva/unpast.git
cd unpast
mkdir -p results

# Define the command to run UnPaSt
command="unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --verbose"

# Run UnPaSt using Docker
docker run --rm -u $(id -u):$(id -g) -v "$(pwd)":/data --entrypoint bash freddsle/unpast -c "cd /data && PYTHONPATH=/data $command"

Requirements

UnPaSt requires Python 3.8 or higher (<3.11) and certain Python and R packages.

Python Dependencies

The Python dependencies are installed automatically when installing via pip (or you can use requirements.txt). They include (with recommended versions):

fisher = ">=0.1.9,<=0.1.14"
pandas = "1.3.5"
python-louvain = "0.15"
matplotlib = "3.7.1"
seaborn = "0.11.1"
numba = ">=0.51.2,<=0.55.2"
numpy = "1.22.3"
scikit-learn = "1.2.2"
scikit-network = ">=0.24.0,<0.26.0"
scipy = ">=1.7.1,<=1.7.3"
statsmodels = "0.13.2"
kneed = "0.8.1"

R Dependencies

UnPaSt utilizes R packages for certain analyses. Ensure that you have R installed with the following packages:

  • WGCNA (version 1.70-3 or higher)
  • limma (version 3.42.2 or higher)

Installation Tips

Installing R Dependencies

It is recommended to use BiocManager for installing R packages:

install.packages("BiocManager")
BiocManager::install("WGCNA")
BiocManager::install("limma")

Installing R

Ensure that R (version 4.3.1 or higher) is installed on your system. You can download R from CRAN.

Input

UnPaSt requires a tab-separated file with features (e.g. genes) in rows, and samples in columns.

  • Feature and sample names must be unique.
  • At least 2 features and 5 samples are required.
  • Data must be between-sample normalized.

Recommendations:

  • It is recommended that UnPaSt be applied to datasets with 20+ samples.
  • If the cohort is not large (<20 samples), reducing the minimal number of samples in a bicluster (min_n_samples) to 2 is recommended.
  • If the number of features is small, using Louvain method for feature clustering instead of WGCNA and/or disabling feature selection by setting the binarization p-value (p-val) to 1 might be helpful.

Examples

  • Simulated data example. Biclustering of a matrix with 10000 rows (features) and 200 columns (samples) with four implanted biclusters consisting of 500 features and 10-100 samples each. For more details, see figure 3 and Methods here.
mkdir -p results;

# running UnPaSt with default parameters and example data
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500

# with different binarization and clustering methods
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --binarization ward --clustering Louvain

# help
python run_unpast.py -h
  • Real data example. Analysis of a subset of 200 samples randomly chosen from TCGA-BRCA dataset, including consensus biclustering and visualization: jupyter-notebook.

Outputs

<basename>.[parameters].biclusters.tsv - A .tsv file containing the identified biclusters with the following structure:

    • the first line starts with #, storing the parameters of UnPaSt
    • the second line contains the column headers.
    • each subsequent line represents a bicluster with the following columns:
    • SNR: Signal-to-noise ratio of the bicluster, calculated as the average SNR of its features.
    • n_genes: Number of genes in the bicluster.
    • n_samples: Number of samples in the bicluster.
    • genes: Space-separated list of gene names.
    • samples: Space-separated list of sample names.
    • direction: Indicates whether the bicluster consists of up-regulated ("UP"), down-regulated ("DOWN"), or both types of genes ("BOTH").
    • genes_up, genes_down: Space-separated lists of up- and down-resulated genes respectively.
    • gene_indexes: 0-based index of the genes in the input matrix.
    • sample_indexes: 0-based index of the samples in the input matrix.

Along with the biclustering result, UnPaSt creates three files with intermediate results in the output folder out_dir:

  • <basename>.[parameters].binarized.tsv with binarized input data.
  • <basename>.[parameters].binarization_stats.tsv provides binarization statistics for each processed feature.
  • <basename>.[parameters].background.tsv stores background distributions of SNR values for each evaluated bicluster size. These files can be used to restart UnPaSt with the same input and seed from the feature clustering step and skip time-consuming feature binarization.

Versions

UnPaSt version used in PathoPlex paper: UnPaSt_PathoPlex.zip

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unpast-0.1.10.tar.gz (20.2 MB view details)

Uploaded Source

Built Distribution

unpast-0.1.10-py3-none-any.whl (20.2 MB view details)

Uploaded Python 3

File details

Details for the file unpast-0.1.10.tar.gz.

File metadata

  • Download URL: unpast-0.1.10.tar.gz
  • Upload date:
  • Size: 20.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for unpast-0.1.10.tar.gz
Algorithm Hash digest
SHA256 ff924c1106bdf91061d67ef2f291e241a9e8b55a819ea030d6847e8ce339c101
MD5 d6271861eddd86a1269d679b954cef94
BLAKE2b-256 37ecdd8f348f0a1c6a3b73c2a9043521722cf3ce0a5e82df24d588e4c8e1f92e

See more details on using hashes here.

File details

Details for the file unpast-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: unpast-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 20.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for unpast-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 6f26237e8941b7902752150be2241e2f3955d9ff368c4e956037b8e375a6839a
MD5 6f19f0d7fef6cc8bdd4c16c3d947de09
BLAKE2b-256 139ae511d9edfcde50b38802d3281c95151883fd8f242ba3bc58147901acdd08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page