Skip to main content

A novel method for unsupervised patient stratification.

Project description

UnPaSt

UnPaSt is a novel method for identification of differentially expressed biclusters.

Cite

UnPaSt preprint https://arxiv.org/abs/2408.00200.

Code: https://github.com/ozolotareva/unpast_paper/

Web server

Run UnPaSt at CoSy.Bio server

Install

Tests status

Docker environment [to be updated]

UnPaSt environment is available also as a Docker image.

docker pull freddsle/unpast
git clone https://github.com/ozolotareva/unpast.git
cd unpast
mkdir -p results

# running UnPaSt with default parameters and example data
command="python unpast/run_unpast.py --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500"
docker run --rm -u $(id -u):$(id -g) -v "$(pwd)":/data --entrypoint bash freddsle/unpast -c "cd /data && PYTHONPATH=/data $command"

Requirements: [to be updated]

Python (version 3.8.16):
    fisher==0.1.9
    pandas==1.3.5
    python-louvain==0.15
    matplotlib==3.7.1
    seaborn==0.11.1
    numba==0.51.2
    numpy==1.22.3
    scikit-learn==1.2.2
    scikit-network==0.24.0
    scipy==1.7.1
    statsmodels==0.13.2
    kneed==0.8.1

R (version 4.3.1):
    WGCNA==1.70-3
    limma==3.42.2

Installation tips [to be updated]

It is recommended to use "BiocManager" for the installation of WGCNA:

install.packages("BiocManager")
library(BiocManager)
BiocManager::install("WGCNA")

Input

UnPaSt requires a tab-separated file with features (e.g. genes) in rows, and samples in columns.

  • Feature and sample names must be unique.
  • At least 2 features and 5 samples are required.
  • Data must be between-sample normalized.

Recommendations:

  • It is recommended that UnPaSt be applied to datasets with 20+ samples.
  • If the cohort is not large (<20 samples), reducing the minimal number of samples in a bicluster (min_n_samples) to 2 is recommended.
  • If the number of features is small, using Louvain method for feature clustering instead of WGCNA and/or disabling feature selection by setting the binarization p-value (p-val) to 1 might be helpful.

Examples

  • Simulated data example. Biclustering of a matrix with 10000 rows (features) and 200 columns (samples) with four implanted biclusters consisting of 500 features and 10-100 samples each. For more details, see figure 3 and Methods here.
mkdir -p results;

# running UnPaSt with default parameters and example data
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500

# with different binarization and clustering methods
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --binarization ward --clustering Louvain

# help
python run_unpast.py -h
  • Real data example. Analysis of a subset of 200 samples randomly chosen from TCGA-BRCA dataset, including consensus biclustering and visualization: jupyter-notebook.

Outputs

<basename>.[parameters].biclusters.tsv - A .tsv file containing the identified biclusters with the following structure:

    • the first line starts with #, storing the parameters of UnPaSt
    • the second line contains the column headers.
    • each subsequent line represents a bicluster with the following columns:
    • SNR: Signal-to-noise ratio of the bicluster, calculated as the average SNR of its features.
    • n_genes: Number of genes in the bicluster.
    • n_samples: Number of samples in the bicluster.
    • genes: Space-separated list of gene names.
    • samples: Space-separated list of sample names.
    • direction: Indicates whether the bicluster consists of up-regulated ("UP"), down-regulated ("DOWN"), or both types of genes ("BOTH").
    • genes_up, genes_down: Space-separated lists of up- and down-resulated genes respectively.
    • gene_indexes: 0-based index of the genes in the input matrix.
    • sample_indexes: 0-based index of the samples in the input matrix.

Along with the biclustering result, UnPaSt creates three files with intermediate results in the output folder out_dir:

  • <basename>.[parameters].binarized.tsv with binarized input data.
  • <basename>.[parameters].binarization_stats.tsv provides binarization statistics for each processed feature.
  • <basename>.[parameters].background.tsv stores background distributions of SNR values for each evaluated bicluster size. These files can be used to restart UnPaSt with the same input and seed from the feature clustering step and skip time-consuming feature binarization.

Versions

UnPaSt version used in PathoPlex paper: UnPaSt_PathoPlex.zip

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unpast-0.1.9.6.3.tar.gz (18.3 MB view details)

Uploaded Source

Built Distribution

unpast-0.1.9.6.3-py3-none-any.whl (18.3 MB view details)

Uploaded Python 3

File details

Details for the file unpast-0.1.9.6.3.tar.gz.

File metadata

  • Download URL: unpast-0.1.9.6.3.tar.gz
  • Upload date:
  • Size: 18.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/6.8.0-45-generic

File hashes

Hashes for unpast-0.1.9.6.3.tar.gz
Algorithm Hash digest
SHA256 34be42b4c5c5de9a823ca4b80ea5aeaf8f0ca54069c67f3e38b15d872b7bc5c8
MD5 42181a95cab22348af74aaed3d27a28f
BLAKE2b-256 a97930520cb0f2498eb7f3e2f400cef2fd4f9aaa7963e7a5751f6691ad1b0426

See more details on using hashes here.

File details

Details for the file unpast-0.1.9.6.3-py3-none-any.whl.

File metadata

  • Download URL: unpast-0.1.9.6.3-py3-none-any.whl
  • Upload date:
  • Size: 18.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/6.8.0-45-generic

File hashes

Hashes for unpast-0.1.9.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ea154dc774254b4d9b64a2c9a72c94182b9c5b2b4d570d3d6017512692e6c879
MD5 ccfdf485f34dd6c23b808a3a225edeb4
BLAKE2b-256 eb6aa4fe61fd8dc2c09875fc74644ade58d16e6f61d416bc73d28fa405ba174c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page