A novel method for unsupervised patient stratification.
Project description
UnPaSt
UnPaSt is a novel method for identification of differentially expressed biclusters.
Cite
UnPaSt preprint https://arxiv.org/abs/2408.00200.
Code: https://github.com/ozolotareva/unpast_paper/
Web server
Install
Docker environment [to be updated]
UnPaSt environment is available also as a Docker image.
docker pull freddsle/unpast
git clone https://github.com/ozolotareva/unpast.git
cd unpast
mkdir -p results
# running UnPaSt with default parameters and example data
command="python unpast/run_unpast.py --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500"
docker run --rm -u $(id -u):$(id -g) -v "$(pwd)":/data --entrypoint bash freddsle/unpast -c "cd /data && PYTHONPATH=/data $command"
Requirements: [to be updated]
Python (version 3.8.16):
fisher==0.1.9
pandas==1.3.5
python-louvain==0.15
matplotlib==3.7.1
seaborn==0.11.1
numba==0.51.2
numpy==1.22.3
scikit-learn==1.2.2
scikit-network==0.24.0
scipy==1.7.1
statsmodels==0.13.2
kneed==0.8.1
R (version 4.3.1):
WGCNA==1.70-3
limma==3.42.2
Installation tips [to be updated]
It is recommended to use "BiocManager" for the installation of WGCNA:
install.packages("BiocManager")
library(BiocManager)
BiocManager::install("WGCNA")
Input
UnPaSt requires a tab-separated file with features (e.g. genes) in rows, and samples in columns.
- Feature and sample names must be unique.
- At least 2 features and 5 samples are required.
- Data must be between-sample normalized.
Recommendations:
- It is recommended that UnPaSt be applied to datasets with 20+ samples.
- If the cohort is not large (<20 samples), reducing the minimal number of samples in a bicluster (
min_n_samples
) to 2 is recommended. - If the number of features is small, using Louvain method for feature clustering instead of WGCNA and/or disabling feature selection by setting the binarization p-value (
p-val
) to 1 might be helpful.
Examples
- Simulated data example. Biclustering of a matrix with 10000 rows (features) and 200 columns (samples) with four implanted biclusters consisting of 500 features and 10-100 samples each. For more details, see figure 3 and Methods here.
mkdir -p results;
# running UnPaSt with default parameters and example data
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500
# with different binarization and clustering methods
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --binarization ward --clustering Louvain
# help
python run_unpast.py -h
- Real data example. Analysis of a subset of 200 samples randomly chosen from TCGA-BRCA dataset, including consensus biclustering and visualization: jupyter-notebook.
Outputs
<basename>.[parameters].biclusters.tsv
- A .tsv
file containing the identified biclusters with the following structure:
-
- the first line starts with
#
, storing the parameters of UnPaSt
- the first line starts with
-
- the second line contains the column headers.
-
- each subsequent line represents a bicluster with the following columns:
- SNR: Signal-to-noise ratio of the bicluster, calculated as the average SNR of its features.
- n_genes: Number of genes in the bicluster.
- n_samples: Number of samples in the bicluster.
- genes: Space-separated list of gene names.
- samples: Space-separated list of sample names.
- direction: Indicates whether the bicluster consists of up-regulated ("UP"), down-regulated ("DOWN"), or both types of genes ("BOTH").
- genes_up, genes_down: Space-separated lists of up- and down-resulated genes respectively.
- gene_indexes: 0-based index of the genes in the input matrix.
- sample_indexes: 0-based index of the samples in the input matrix.
Along with the biclustering result, UnPaSt creates three files with intermediate results in the output folder out_dir
:
<basename>.[parameters].binarized.tsv
with binarized input data.<basename>.[parameters].binarization_stats.tsv
provides binarization statistics for each processed feature.<basename>.[parameters].background.tsv
stores background distributions of SNR values for each evaluated bicluster size. These files can be used to restart UnPaSt with the same input and seed from the feature clustering step and skip time-consuming feature binarization.
Versions
UnPaSt version used in PathoPlex paper: UnPaSt_PathoPlex.zip
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unpast-0.1.9.6.3.tar.gz
.
File metadata
- Download URL: unpast-0.1.9.6.3.tar.gz
- Upload date:
- Size: 18.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/6.8.0-45-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34be42b4c5c5de9a823ca4b80ea5aeaf8f0ca54069c67f3e38b15d872b7bc5c8 |
|
MD5 | 42181a95cab22348af74aaed3d27a28f |
|
BLAKE2b-256 | a97930520cb0f2498eb7f3e2f400cef2fd4f9aaa7963e7a5751f6691ad1b0426 |
File details
Details for the file unpast-0.1.9.6.3-py3-none-any.whl
.
File metadata
- Download URL: unpast-0.1.9.6.3-py3-none-any.whl
- Upload date:
- Size: 18.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.12 Linux/6.8.0-45-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea154dc774254b4d9b64a2c9a72c94182b9c5b2b4d570d3d6017512692e6c879 |
|
MD5 | ccfdf485f34dd6c23b808a3a225edeb4 |
|
BLAKE2b-256 | eb6aa4fe61fd8dc2c09875fc74644ade58d16e6f61d416bc73d28fa405ba174c |