Skip to main content

Tumor genomic subtyping using mutational signatures

Project description

GS-PRACTICE (Genomic Subtyping and Predictive Response Analysis for Cancer Tumor ICi Efficacy)

Tumor genomic subtyping tool based on mutational signatures for cancer samples.
See the corresponding paper for details.


Overview of the pipeline


Requirements

Python >= 3.7

  • numpy
  • numba
  • scikit_learn
  • joblib
  • umap_learn
  • pandas
  • matplotlib
  • rpy2

Please see requirements.txt for detailed versions.

R >= 3.6

  • MutationalPatterns
  • BSgenome.Hsapiens.UCSC.hg38
  • BSgenome.Hsapiens.UCSC.hg19

Please see r_requirements.txt for detailed versions.
These have been tested on mac OSX and Linux.


Installation

To avoid package dependency issues, I recommend installation in a virtual environment created with anaconda, miniconda, miniforge, pyenv etc.

cd GS-PRACTICE
python setup.py install

Or, from PyPI,

pip install GS-PRACTICE

Then,

pip install -r requirements.txt

After install R enviroments,

pip install rpy2

Preparations

Make classifiers in your environment

After installing the requirements, you should configure classifiers and umap projector in your environment and save them as joblib files.
By default, joblib files named KNN, SVC, RFC, LRC, and UMAP_projector are generated in gspractice/data directory.

gs-makeclfs

Check that R works

Before using rpy2, you should check the behavior of R in your environment.
For example,

cd tests/R_script_for_test
Rscript MutationalPatterns_from_single_vcf.R
Rscript MutationalPatterns_from_list.R
Rscript MutationalPatterns_from_maf.R

Usage

gs-practice -i {input_file} -o {output_prefix} 

Input files

VCF file(version >= 4.0) or MAF file after somatic mutation calling are accepted.
VCF file needs to contain a header starting with "##fileformat=VCFv**".
MAF file needs to contain columns named "Hugo_Symbol", "Chromosome", "Start_Position", "End_Position", "Variant_Type", "Reference_Allele", "Tumor_Seq_Allele1", "Tumor_Seq_Allele2" and "Tumor_Sample_Barcode".
The default genome version is GRCh38 (hg38), but GRCh37 (hg19) is also accepted if you specify the option -gv hg19.

  • single VCF file (end with ***.vcf)
  • list of paths to multiple VCF files (end with ***.list)
  • MAF file (end with ***.maf)

Example

gs-practice -i input.vcf -o output_prefix [-sn samplename ]
gs-practice -i vcffiles.list -o output_prefix [-snl samplenames.list]
gs-practice -i input.maf -o output_prefix

Output files

{output_prefix}_decomposed.tsv
{output_prefix}_prediction.txt
{output_prefix}_umap.png

Test

cd tests
bash test_run.sh

This will generates the following files in the "tests/output" directory


Getting Help

gs-practice -h

Notes

  • If you want, you can change hyper parameters of classifiers by directly editing the script "gspractice/makeclfs.py".
  • You may use the tool without installation if you satisfy the requirements.
python src/gspractice/makeclfs.py
python src/gspractice/run_gspractice.py -h
python src/gspractice/run_gspractice.py -i {input_file} -o {output_prefix}

Citation

Currently, the corresonding paper is published as a preprint in MedRxiv.
Mutation burden-orthogonal tumor genomic subtypes delineate responses to immune checkpoint therapy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GS-PRACTICE-1.0.1.tar.gz (13.0 MB view hashes)

Uploaded Source

Built Distribution

GS_PRACTICE-1.0.1-py3-none-any.whl (7.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page