Skip to main content

Detect coverage drops in LRS STR expansion calls

Project description

strdrop

Flag STR coverage drops in LRS data

A simple tool to leverage a collection of reference samples to calculate normal sequencing depths over loci in a locus catalog, and flagging sites with drops in new, N+1, samples.

Call coverage drop if alleles for a repeat in the test file

  • are identical, or fairly similar, defined as within EDIT_RATIO_CUTOFF (default 0.9) Levenshtein edit distance ratio
  • are covered (have a total sequencing depth SD) at a fraction of CASE_COVERAGE_RATIO_CUTOFF (default 0.55) or below of case average locus coverage
  • are among the lowest alpha/N_loci (default 0.05/N_loci) of total sequencing depth values for that locus compared to the test set

Coverage drop calls are marked in the output VCF with FILTER tag LowDepth and INFO tag STRDROP. Edit ratio STRDROP_EDR, case coverage ratio STRDROP_SDR and coverage marginal P value STRDROP_P are output as INFO keys on the resulting VCF.

Usage

 Usage: strdrop [OPTIONS] COMMAND [ARGS]...

 Call coverage drops over alleles in STR VCFs

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version  -v        Show version and exit.                                                                                                                                                                                                                                                                                │
│ --help               Show this message and exit.                                                                                                                                                                                                                                                                           │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ call    STRdrop: Detect drops in STR coverage 🧬                                                                                                                                                                                                                                                                           │
│ build   STRdrop: Build reference json from sequencing coverage in STR VCFs                                                                                                                                                                                                                                                 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

 Usage: strdrop build [OPTIONS] REFERENCE_FILE

 STRdrop: Build reference json from sequencing coverage in STR VCFs

╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    reference_file      PATH  Output reference archive [required]                                                                                                                                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --training-set        PATH  Input directory with reference data [required]                                                                                                                                                                                                                                              │
│    --help                      Show this message and exit.                                                                                                                                                                                                                                                                 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯



 Usage: strdrop call [OPTIONS] INPUT_FILE OUTPUT_FILE

 STRdrop: Detect drops in STR coverage 🧬

╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    input_file       PATH  Input STR call VCF file [required]                                                                                                                                                                                                                                                             │
│ *    output_file      PATH  Output annotated VCF file [required]                                                                                                                                                                                                                                                           │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --training-set               PATH   Training VCF directory or json with reference data [required]                                                                                                                                                                                                                       │
│    --xy              --no-xy           Treat as karyotype XY. Give one xy option per sample.                                                                                                                                                                                                                               │
│    --alpha                      FLOAT  Unadjusted probability confidence level for coverage test [default: 0.05]                                                                                                                                                                                                           │
│    --fraction                   FLOAT  Case average adjusted sequencing depth ratio cutoff [default: 0.55]                                                                                                                                                                                                                 │
│    --edit                       FLOAT  Allele similarity Levenshtein edit distance ratio cutoff [default: 0.9]                                                                                                                                                                                                             │
│    --help                              Show this message and exit.                                                                                                                                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

You can test the functionality with one of the test files with relatively low coverage:

uv run strdrop call --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf

To avoid spurious calls on X for XY karyotype, pass the --xy flag to lower the coverage ratio expectation with a 0.5 shift.

uv run strdrop call --xy --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf

Note the tags in the output VCF.

➜  strdrop git:(main) ✗ grep HTT GV-42.strdrop.vcf
chr4	3074876	.	CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT	CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT,CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT	.	LowDepth	TRID=HD_HTT;END=3074969;MOTIFS=CAG,CCG;STRUC=<TR>;STRDROP_P=0;STRDROP_EDR=0.967033;STRDROP_SDR=0.470972;STRDROP	GT:AL:ALLR:SD:MC:MS:AP:AM	1/2:84,87:83-84,87-93:25,32:17_8,18_8:0(0-48)_0(51-54)_1(54-57)_1(60-81),0(0-51)_0(54-57)_1(57-60)_1(63-84):0.964286,0.965517:0.02,0.01```
➜  strdrop git:(main) ✗ grep Strdrop GV-42.strdrop.vcf |head -9
##INFO=<ID=STRDROP_P,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##INFO=<ID=STRDROP_EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##INFO=<ID=STRDROP_SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##INFO=<ID=STRDROP,Number=0,Type=Flag,Description="Strdrop coverage drop detected">
##FILTER=<ID=LowDepth,Description="Strdrop coverage drop detected">
##FORMAT=<ID=SDP,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##FORMAT=<ID=EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##FORMAT=<ID=SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##FORMAT=<ID=DROP,Number=0,Type=String,Description="Strdrop coverage drop detected, 1 for LowDepth">

Output files can be written as compressed VCF or BCF by simply giving an appropriate outfile name suffix, thanks to CyVCF2.

strdrop call --training-set /home/daniel.nilsson/proj/strdrop/reference/ mycase.trgt.vcf.gz mycase.trgt.strdrop.vcf.gz

STRdrop logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strdrop-0.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strdrop-0.3-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file strdrop-0.3.tar.gz.

File metadata

  • Download URL: strdrop-0.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for strdrop-0.3.tar.gz
Algorithm Hash digest
SHA256 bdbbc213f116726638bb9fa1e3fe510468ffb92cb2fe666b553993c33d94097c
MD5 6fea3a97d76b19b129ab6c474a027e40
BLAKE2b-256 d86adc0fc3df08726fdc7fe9064fe0ea1146d0690877cebb69881d81537978e2

See more details on using hashes here.

File details

Details for the file strdrop-0.3-py3-none-any.whl.

File metadata

  • Download URL: strdrop-0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for strdrop-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 47adb1e53f35c75770a45f63afac1b779335371c03d8d365cf69292dd3d94973
MD5 4492c5935fdf2efc7ac6fa2dbf8b7cb3
BLAKE2b-256 838e662e58aa8de2550bc5a973e92cc9c25bcd00d7e923df04c04554fe9b1e56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page