A novel method for unsupervised patient stratification.
Project description
UnPaSt
UnPaSt is a novel method for identification of differentially expressed biclusters.
Cite
UnPaSt preprint https://arxiv.org/abs/2408.00200.
Code: https://github.com/ozolotareva/unpast_paper/
Web server
Install
Install via pip
UnPaSt is available on PyPI and can be installed using pip:
pip install unpast
Do not forget to install necessary R packages (see below).
You can run UnPaSt from the command line using the unpast
command.
unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500
Docker Environment
UnPaSt is also available as a Docker image. To pull the Docker Image:
docker pull freddsle/unpast:latest
Replace latest
with a specific version tag if desired (for version before 10.2024 - v0.1.8).
Run UnPaSt using Docker
# Clone the repository to get example data
git clone https://github.com/ozolotareva/unpast.git
cd unpast
mkdir -p results
# Define the command to run UnPaSt
command="unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --verbose"
# Run UnPaSt using Docker
docker run --rm -u $(id -u):$(id -g) -v "$(pwd)":/data --entrypoint bash freddsle/unpast -c "cd /data && PYTHONPATH=/data $command"
Requirements
UnPaSt requires Python 3.8 or higher (<3.11) and certain Python and R packages.
Python Dependencies
The Python dependencies are installed automatically when installing via pip (or you can use requirements.txt). They include (with recommended versions):
fisher = ">=0.1.9,<=0.1.14"
pandas = "1.3.5"
python-louvain = "0.15"
matplotlib = "3.7.1"
seaborn = "0.11.1"
numba = ">=0.51.2,<=0.55.2"
numpy = "1.22.3"
scikit-learn = "1.2.2"
scikit-network = ">=0.24.0,<0.26.0"
scipy = ">=1.7.1,<=1.7.3"
statsmodels = "0.13.2"
kneed = "0.8.1"
R Dependencies
UnPaSt utilizes R packages for certain analyses. Ensure that you have R installed with the following packages:
WGCNA
(version 1.70-3 or higher)limma
(version 3.42.2 or higher)
Installation Tips
Installing R Dependencies
It is recommended to use BiocManager
for installing R packages:
install.packages("BiocManager")
BiocManager::install("WGCNA")
BiocManager::install("limma")
Installing R
Ensure that R (version 4.3.1 or higher) is installed on your system. You can download R from CRAN.
Input
UnPaSt requires a tab-separated file with features (e.g. genes) in rows, and samples in columns.
- Feature and sample names must be unique.
- At least 2 features and 5 samples are required.
- Data must be between-sample normalized.
Recommendations:
- It is recommended that UnPaSt be applied to datasets with 20+ samples.
- If the cohort is not large (<20 samples), reducing the minimal number of samples in a bicluster (
min_n_samples
) to 2 is recommended. - If the number of features is small, using Louvain method for feature clustering instead of WGCNA and/or disabling feature selection by setting the binarization p-value (
p-val
) to 1 might be helpful.
Examples
- Simulated data example. Biclustering of a matrix with 10000 rows (features) and 200 columns (samples) with four implanted biclusters consisting of 500 features and 10-100 samples each. For more details, see figure 3 and Methods here.
mkdir -p results;
# running UnPaSt with default parameters and example data
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500
# with different binarization and clustering methods
python -m unpast.run_unpast --exprs unpast/tests/scenario_B500.exprs.tsv.gz --basename results/scenario_B500 --binarization ward --clustering Louvain
# help
python run_unpast.py -h
- Real data example. Analysis of a subset of 200 samples randomly chosen from TCGA-BRCA dataset, including consensus biclustering and visualization: jupyter-notebook.
Outputs
<basename>.[parameters].biclusters.tsv
- A .tsv
file containing the identified biclusters with the following structure:
-
- the first line starts with
#
, storing the parameters of UnPaSt
- the first line starts with
-
- the second line contains the column headers.
-
- each subsequent line represents a bicluster with the following columns:
- SNR: Signal-to-noise ratio of the bicluster, calculated as the average SNR of its features.
- n_genes: Number of genes in the bicluster.
- n_samples: Number of samples in the bicluster.
- genes: Space-separated list of gene names.
- samples: Space-separated list of sample names.
- direction: Indicates whether the bicluster consists of up-regulated ("UP"), down-regulated ("DOWN"), or both types of genes ("BOTH").
- genes_up, genes_down: Space-separated lists of up- and down-resulated genes respectively.
- gene_indexes: 0-based index of the genes in the input matrix.
- sample_indexes: 0-based index of the samples in the input matrix.
Along with the biclustering result, UnPaSt creates three files with intermediate results in the output folder out_dir
:
<basename>.[parameters].binarized.tsv
with binarized input data.<basename>.[parameters].binarization_stats.tsv
provides binarization statistics for each processed feature.<basename>.[parameters].background.tsv
stores background distributions of SNR values for each evaluated bicluster size. These files can be used to restart UnPaSt with the same input and seed from the feature clustering step and skip time-consuming feature binarization.
Versions
UnPaSt version used in PathoPlex paper: UnPaSt_PathoPlex.zip
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unpast-0.1.10.tar.gz
.
File metadata
- Download URL: unpast-0.1.10.tar.gz
- Upload date:
- Size: 20.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff924c1106bdf91061d67ef2f291e241a9e8b55a819ea030d6847e8ce339c101 |
|
MD5 | d6271861eddd86a1269d679b954cef94 |
|
BLAKE2b-256 | 37ecdd8f348f0a1c6a3b73c2a9043521722cf3ce0a5e82df24d588e4c8e1f92e |
File details
Details for the file unpast-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: unpast-0.1.10-py3-none-any.whl
- Upload date:
- Size: 20.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f26237e8941b7902752150be2241e2f3955d9ff368c4e956037b8e375a6839a |
|
MD5 | 6f19f0d7fef6cc8bdd4c16c3d947de09 |
|
BLAKE2b-256 | 139ae511d9edfcde50b38802d3281c95151883fd8f242ba3bc58147901acdd08 |