Tumor genomic subtyping using mutational signatures
Project description
GS-PRACTICE (Genomic Subtyping and Predictive Response Analysis for Cancer Tumor ICi Efficacy)
Tumor genomic subtyping tool based on mutational signatures for cancer samples.
See the corresponding paper for details.
Overview of the pipeline
Requirements
Python >= 3.7
- numpy
- numba
- scikit_learn
- joblib
- umap_learn
- pandas
- matplotlib
- rpy2
Please see requirements.txt for detailed versions.
R >= 3.6
- MutationalPatterns
- BSgenome.Hsapiens.UCSC.hg38
- BSgenome.Hsapiens.UCSC.hg19
Please see r_requirements.txt for detailed versions.
These have been tested on mac OSX and Linux.
Installation
To avoid package dependency issues, I recommend installation in a virtual environment created with anaconda, miniconda, miniforge, pyenv etc.
cd GS-PRACTICE
python setup.py install
Or, from PyPI,
pip install GS-PRACTICE
Then,
pip install -r requirements.txt
After install R enviroments,
pip install rpy2
Preparations
Make classifiers in your environment
After installing the requirements, you should configure classifiers and umap projector in your environment and save them as joblib files. By default, joblib files named KNN, SVC, RFC, LRC, and UMAP_projector are generated in gspractice/data
directory.
gs-makeclfs
Check that R works
Before using rpy2, you should check the behavior of R in your environment.
For example,
cd tests/R_script_for_test
Rscript MutationalPatterns_from_single_vcf.R
Rscript MutationalPatterns_from_list.R
Rscript MutationalPatterns_from_maf.R
Usage
gs-practice -i {input_file} -o {output_prefix}
Input files
VCF file(version >= 4.0) or MAF file after somatic mutation calling are accepted.
VCF file needs to contain a header starting with "##fileformat=VCFv**".
MAF file needs to contain columns named "Hugo_Symbol", "Chromosome", "Start_Position", "End_Position", "Variant_Type", "Reference_Allele", "Tumor_Seq_Allele1", "Tumor_Seq_Allele2" and "Tumor_Sample_Barcode".
The default genome version is GRCh38 (hg38), but GRCh37 (hg19) is also accepted if you specify the option -gv hg19
.
- single VCF file (end with ***.vcf)
- list of paths to multiple VCF files (end with ***.list)
- MAF file (end with ***.maf)
Example
gs-practice -i input.vcf -o output_prefix [-sn samplename ]
gs-practice -i vcffiles.list -o output_prefix [-snl samplenames.list]
gs-practice -i input.maf -o output_prefix
Output files
{output_prefix}_decomposed.tsv
{output_prefix}_prediction.txt
{output_prefix}_umap.png
Test
cd tests
bash test_run.sh
This will generates the following files in the "tests/output" directory
- TCGA_mutect2_random100_1_decomposed.tsv
- TCGA_mutect2_random100_1_prediction.tsv
- TCGA_mutect2_random100_1_umap.png
Getting Help
gs-practice -h
Notes
- If you want, you can change hyper parameters of classifiers by directly editing the script "gspractice/makeclfs.py".
- You may use the tool without installation if you satisfy the requirements.
python src/gspractice/makeclfs.py
python src/gspractice/run_gspractice.py -h
python src/gspractice/run_gspractice.py -i {input_file} -o {output_prefix}
Citation
Currently, the corresonding paper is published as a preprint in MedRxiv.
Mutation burden-orthogonal tumor genomic subtypes delineate responses to immune checkpoint therapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for GS_PRACTICE-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2482999a80079b0012d169962b4b0589d953e7bd428dba191ddd9d094577302 |
|
MD5 | 560b492bec2bf5fa524498940aafc967 |
|
BLAKE2b-256 | cf4ccc2fd11f2fe2f52a895c2eeb9540d9616bdd50e90a653925a917f38aa047 |