Skip to main content

Detect coverage drops in LRS STR expansion calls

Project description

strdrop

STRdrop logo - a shooting star with a STRawberry

PyPI Version GitHub Release Date GitHub commits latest GitHub commit rate GitHub issues-closed Average time to resolve an issue Percentage of issues still open Ruff uv Woke

Flag STR coverage drops in LRS data

A simple tool to leverage a collection of reference samples to calculate normal sequencing depths over loci in a locus catalog, and flagging sites with drops in new, N+1, samples.

Call coverage drop if alleles for a repeat in the test file

  • are identical, or fairly similar, defined as within EDIT_RATIO_CUTOFF (default 0.9) Levenshtein edit distance ratio
  • are covered (have a total sequencing depth SD) at a fraction of CASE_COVERAGE_RATIO_CUTOFF (default 0.55) or below of case average locus coverage
  • are among the lowest alpha/N_loci (default 0.05/N_loci) of total sequencing depth values for that locus compared to the test set

Coverage drop calls are marked in the output VCF with FILTER tag LowDepth and INFO tag STRDROP. Edit ratio STRDROP_EDR, case coverage ratio STRDROP_SDR and coverage marginal P value STRDROP_P are output as INFO keys on the resulting VCF.

Usage

 Usage: strdrop [OPTIONS] COMMAND [ARGS]...

 Call coverage drops over alleles in STR VCFs

╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version  -v        Show version and exit.                                                                                               │
│ --help               Show this message and exit.                                                                                          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ call    STRdrop: Detect drops in STR coverage 🧬                                                                                          │
│ build   STRdrop: Build reference json from sequencing coverage in STR VCFs                                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯



 Usage: strdrop call [OPTIONS] INPUT_FILE OUTPUT_FILE

 STRdrop: Detect drops in STR coverage 🧬

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    input_file       PATH  Input STR call VCF file [required]                                                                            │
│ *    output_file      PATH  Output annotated VCF file [required]                                                                          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --training-set        PATH       Training VCF directory or json with reference data [required]                                         │
│    --xy                  SAMPLE ID  Sample in VCF to treat as karyotype XY. May be specified multiple times.                              │
│    --alpha               FLOAT      Unadjusted probability confidence level for coverage test [default: 0.05]                             │
│    --fraction            FLOAT      Case average adjusted sequencing depth ratio cutoff [default: 0.55]                                   │
│    --edit                FLOAT      Allele similarity Levenshtein edit distance ratio cutoff [default: 0.9]                               │
│    --help                           Show this message and exit.                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯



 Usage: strdrop build [OPTIONS] REFERENCE_FILE

 STRdrop: Build reference json from sequencing coverage in STR VCFs

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    reference_file      PATH  Output reference archive [required]                                                                        │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --training-set        PATH  Input directory with reference data [required]                                                             │
│    --help                      Show this message and exit.                                                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

You can test the functionality with one of the test files with relatively low coverage:

uv run strdrop call --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf

To avoid spurious calls on X for XY karyotype, pass the --xy <sample_name> option to lower the coverage ratio expectation with a 0.5 shift.

uv run strdrop call --xy GV-42 --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf

Note the tags in the output VCF.

➜  strdrop git:(main) ✗ grep HTT GV-42.strdrop.vcf
chr4	3074876	.	CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT	CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT,CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT	.	LowDepth	TRID=HD_HTT;END=3074969;MOTIFS=CAG,CCG;STRUC=<TR>;STRDROP_P=0;STRDROP_EDR=0.967033;STRDROP_SDR=0.470972;STRDROP	GT:AL:ALLR:SD:MC:MS:AP:AM	1/2:84,87:83-84,87-93:25,32:17_8,18_8:0(0-48)_0(51-54)_1(54-57)_1(60-81),0(0-51)_0(54-57)_1(57-60)_1(63-84):0.964286,0.965517:0.02,0.01```
➜  strdrop git:(main) ✗ grep Strdrop GV-42.strdrop.vcf |head -9
##INFO=<ID=STRDROP_P,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##INFO=<ID=STRDROP_EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##INFO=<ID=STRDROP_SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##INFO=<ID=STRDROP,Number=0,Type=Flag,Description="Strdrop coverage drop detected">
##FILTER=<ID=LowDepth,Description="Strdrop coverage drop detected">
##FORMAT=<ID=SDP,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##FORMAT=<ID=EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##FORMAT=<ID=SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##FORMAT=<ID=DROP,Number=1,Type=String,Description="Strdrop coverage drop detected, 1 for LowDepth">

Output files can be written as compressed VCF or BCF by simply giving an appropriate outfile name suffix, thanks to CyVCF2.

strdrop call --training-set /home/daniel.nilsson/proj/strdrop/reference/ mycase.trgt.vcf.gz mycase.trgt.strdrop.vcf.gz

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strdrop-0.3.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strdrop-0.3.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file strdrop-0.3.2.tar.gz.

File metadata

  • Download URL: strdrop-0.3.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for strdrop-0.3.2.tar.gz
Algorithm Hash digest
SHA256 e606f397306a2cf5d8f3cda7f643f566fc0907bdde33fd266bf17dd5479e50dd
MD5 75de9ae03398907a32f33fc05e07fd46
BLAKE2b-256 96b7afd219f2c212f0d35bd0039dc10c769cb6975fd43f5c28d709f7594bcf82

See more details on using hashes here.

File details

Details for the file strdrop-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: strdrop-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for strdrop-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 93497d84fe3e0f7f296798084a82a294dbd554d7b54263d1e12867929ab7e683
MD5 826aacbc82ca89df2f2d18cccfee6069
BLAKE2b-256 36abdad05e38d346dcf85d112dd8a3d8637923bf69b8c356fb6de53b2a5962a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page