Skip to main content

Utilities short tandem repeats (STRs)

Project description

str-analysis

This package contains scripts and utilities related to analyzing short tandem repeats (STRs) from short read data.


call_rfc1_canvas_alleles

This script takes a WGS bam or cram file and outputs a .json file containing details about alleles it detected at the RFC1/CANVAS STR locus. The main fields in the output dictionary are:

sample_id: If this value is not specified as a command line arg, it is parsed from the input bam/cram file header.
call: describes the alleles detected at the RFC1/CANVAS locus. Its format is analogous to a VCF genotype. Possible values are:

  • "PATHOGENIC MOTIF / PATHOGENIC MOTIF": only pathogenic allele(s) detected
  • "BENIGN MOTIF / BENIGN MOTIF": only benign allele(s) detected
  • "MOTIF OF UNCERTAIN SIGNIFICANCE / MOTIF OF UNCERTAIN SIGNIFICANCE": non-canonical allele(s) detected with unknown pathogenicity
  • "BENIGN MOTIF / PATHOGENIC MOTIF": heterozygous for a benign allele and a pathogenic allele, implying carrier status
  • "PATHOGENIC MOTIF / MOTIF OF UNCERTAIN SIGNIFICANCE": heterozygous for a pathogenic allele and a non-canonical allele(s) detected with unknown pathogenicity
  • "BENIGN MOTIF / MOTIF OF UNCERTAIN SIGNIFICANCE": heterozygous for a benign allele and a non-canonical allele(s) detected with unknown pathogenicity
  • null: not enough evidence in the read data to support any of the above options

allele1_repeat_unit: the repeat unit that is supported by the most reads.
allele1_read_count: the number of reads supporting allele1.
allele1_n_occurrences: the total number of times allele1 occurs in the reads at the RFC1 locus.

allele2_repeat_unit: the repeat unit that is supported by the next most reads, or null if all reads support allele1.
allele2_read_count: see "allele1_read_count" description.
allele2_n_occurrences: see "allele1_n_occurrences" description.

left_flank_coverage: average read depth within a 2kb window immediately to the left of the RFC1 locus
right_flank_coverage: average read depth within a 2kb window immediately to the right of the RFC1 locus

Also, this script optionally takes an ExpansionHunterDenovo profile for this sample and copies relevant fields to the output. The ExpansionHunterDenovo profile isn't used in calculations.

Example command line:

call_rfc1_canvas_alleles -e sample1.str_profile.json -g 38 sample1.cram

combine_json_to_tsv

This script can combine the call_rfc1_canvas_alleles output json files for multiple samples into a single .tsv. The script takes the paths of the .json files as input, or, if none are provided, it searches for .json files in the current directory and subdirectories.

Example command line:

combine_json_to_tsv  sample1.rfc1_canvas_alleles.json  sample2.rfc1_canvas_alleles.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

str_analysis-0.2.tar.gz (18.7 kB view hashes)

Uploaded Source

Built Distribution

str_analysis-0.2-py3-none-any.whl (28.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page