Utilities for analyzing short tandem repeats (STRs)
Project description
str-analysis
This repo contains scripts and utilities for analyzing tandem repeats (TRs).
-
Tools:
- call_non_ref_motifs - takes a bam/cram file and, optionally, an ExpansionHunter variant catalog. Then, for each locus, it determines which STR motifs are supported by reads overlapping that locus before
- running ExpansionHunter on the motif(s) it detected.
- filter_vcf_to_STR_variants - takes a single-sample VCF file and filters it to the INS/DEL variants that represent tandem repeat expansions or contractions by peforming brute-force k-mer search on each variant's inserted or deleted bases. This tool was a core part of Weisburd, B., Tiao, G. & Rehm, H. L. Insights from a genome-wide truth set of tandem repeat variation. (2023)
- add_adjacent_loci_to_expansion_hunter_catalog - takes an ExpansionHunter variant catalog and a bed file containing all simple repeats in the reference genome. Outputs a new catalog with updated LocusStructures and ReferenceRegions that include any adjacent repeats found near each locus in the input catalog.
- check_trios_for_mendelian_violations - takes a table of combined ExpanssionHunter calls generated by the combine_str_json_to_tsv script (see below) as well as a FAM file. Outputs a new table indicating which calls were transmitted without expansion or contraction, and which were mendelian violations.
- simulate_str_expansions - uses wgsim to generate .bam files with simulated read data containing STR expansions at a given locus, and having a given number of repeats, motif, zygosity, etc.
-
ExpansionHunterDenovo output post-processing:
- annotate_EHdn_locus_outliers - takes an ExpansionHunterDenovo outlier result table (locus outliers or case-control) as well as a bed file containing all simple repeats in the reference genome and, optionally, a gene models GTF file, a variant catalog of known-disease associated loci, and/or other bed files with genomic regions of interest. Outputs a new table where each EHdn outlier is annotated with multiple columns related to the provided reference data.
- convert_annotated_EHdn_locus_outliers_to_expansion_hunter_catalog - takes the output table from annotate_EHdn_locus_outliers and lets the user apply a range of filters before writing out the passing loci to an ExpansionHunter variant catalog.
-
gnomAD STR calls:
- generate_gnomad_json - was used to combine the gnomAD STR calls into the files available for download on the gnomAD website.
-
combine ExpansionHunter or other .json results files:
- combine_str_json_to_tsv - takes a set of ExpansionHunter json output files and combines them into a single tsv table.
- combine_json_to_tsv - takes a set of arbitrary json files that share the same schema and combines their top-level fields into a single tsv file.
-
format converters:
- convert_bed_to_expansion_hunter_variant_catalog
- convert_expansion_hunter_variant_catalog_to_gangstr_spec
- convert_gangstr_spec_to_expansion_hunter_variant_catalog
- convert_expansion_hunter_denovo_locus_tsv_to_bed
- convert_gangstr_vcf_to_expansion_hunter_json
- convert_hipstr_vcf_to_expansion_hunter_json
- convert_strling_calls_to_expansion_hunter_json
Installation
To install using pip, run:
python3 -m pip install --upgrade str_analysis
or use the docker image:
docker run -it weisburd/str-analysis:latest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
str_analysis-1.1.tar.gz
(163.7 kB
view hashes)
Built Distribution
str_analysis-1.1-py3-none-any.whl
(244.2 kB
view hashes)
Close
Hashes for str_analysis-1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf7a4290766f3a4bc89c2905198baac8af9d106ebbf5d25573e04770a8cbe50e |
|
MD5 | 26428ea99bbc7a59bc1cadf6ebbc0263 |
|
BLAKE2b-256 | 511f915b61f844b7470824b977c011322f61b1f2637cc9833998a98912939d67 |