Utilities short tandem repeats (STRs)
Project description
str-analysis
This package contains scripts and utilities related to analyzing short tandem repeats (STRs) from short read data.
call_rfc1_canvas_alleles
This script takes a WGS bam or cram file and outputs a .json file containing details about alleles it detected at the RFC1/CANVAS STR locus. The main fields in the output dictionary are:
sample_id: If this value is not specified as a command line arg, it is parsed from the input bam/cram file header.
call: describes the alleles detected at the RFC1/CANVAS locus. Its format is analogous to a VCF genotype. Possible values are:
- "PATHOGENIC MOTIF / PATHOGENIC MOTIF": only pathogenic allele(s) detected
- "BENIGN MOTIF / BENIGN MOTIF": only benign allele(s) detected
- "MOTIF OF UNCERTAIN SIGNIFICANCE / MOTIF OF UNCERTAIN SIGNIFICANCE": non-canonical allele(s) detected with unknown pathogenicity
- "BENIGN MOTIF / PATHOGENIC MOTIF": heterozygous for a benign allele and a pathogenic allele, implying carrier status
- "PATHOGENIC MOTIF / MOTIF OF UNCERTAIN SIGNIFICANCE": heterozygous for a pathogenic allele and a non-canonical allele(s) detected with unknown pathogenicity
- "BENIGN MOTIF / MOTIF OF UNCERTAIN SIGNIFICANCE": heterozygous for a benign allele and a non-canonical allele(s) detected with unknown pathogenicity
- null: not enough evidence in the read data to support any of the above options
allele1_repeat_unit: the repeat unit that is supported by the most reads.
allele1_read_count: the number of reads supporting allele1.
allele1_n_occurrences: the total number of times allele1 occurs in the reads at the RFC1 locus.
allele2_repeat_unit: the repeat unit that is supported by the next most reads, or null if all reads support allele1.
allele2_read_count: see "allele1_read_count" description.
allele2_n_occurrences: see "allele1_n_occurrences" description.
left_flank_coverage: average read depth within a 2kb window immediately to the left of the RFC1 locus
right_flank_coverage: average read depth within a 2kb window immediately to the right of the RFC1 locus
Also, this script optionally takes an ExpansionHunterDenovo profile for this sample and copies relevant fields to the output. The ExpansionHunterDenovo profile isn't used in calculations.
Example command line:
call_rfc1_canvas_alleles -e sample1.str_profile.json -g 38 sample1.cram
combine_json_to_tsv
This script can combine the call_rfc1_canvas_alleles
output json files for multiple samples into
a single .tsv. The script takes the paths of the .json files as input, or, if none are provided, it searches for .json
files in the current directory and subdirectories.
Example command line:
combine_json_to_tsv sample1.rfc1_canvas_alleles.json sample2.rfc1_canvas_alleles.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for str_analysis-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e378bed0c275def12968a2b77a464638854b69d9cf4f091bb69aeda5f893b6a5 |
|
MD5 | c7ad2d42d8cd161c5f12922a5082d903 |
|
BLAKE2b-256 | 07f661323a361cc57cc28bec53a56aa9f21cfc31232f13a52fe9bab789e3dbd4 |