Detect coverage drops in LRS STR expansion calls
Project description
strdrop
Flag STR coverage drops in LRS data
A simple tool to leverage a collection of reference samples to calculate normal sequencing depths over loci in a locus catalog, and flagging sites with drops in new, N+1, samples.
Call coverage drop if alleles for a repeat in the test file
- are identical, or fairly similar, defined as within EDIT_RATIO_CUTOFF (default 0.9) Levenshtein edit distance ratio
- are covered (have a total sequencing depth SD) at a fraction of CASE_COVERAGE_RATIO_CUTOFF (default 0.55) or below of case average locus coverage
- are among the lowest alpha/N_loci (default 0.05/N_loci) of total sequencing depth values for that locus compared to the test set
Coverage drop calls are marked in the output VCF with FILTER tag LowDepth and INFO tag STRDROP. Edit ratio STRDROP_EDR, case coverage ratio STRDROP_SDR and coverage marginal P value STRDROP_P are output as INFO keys on the resulting VCF.
Usage
Usage: strdrop [OPTIONS] COMMAND [ARGS]...
Call coverage drops over alleles in STR VCFs
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v Show version and exit. │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ call STRdrop: Detect drops in STR coverage 🧬 │
│ build STRdrop: Build reference json from sequencing coverage in STR VCFs │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Usage: strdrop call [OPTIONS] INPUT_FILE OUTPUT_FILE
STRdrop: Detect drops in STR coverage 🧬
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * input_file PATH Input STR call VCF file [required] │
│ * output_file PATH Output annotated VCF file [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --training-set PATH Training VCF directory or json with reference data [required] │
│ --xy SAMPLE ID Sample in VCF to treat as karyotype XY. May be specified multiple times. │
│ --alpha FLOAT Unadjusted probability confidence level for coverage test [default: 0.05] │
│ --fraction FLOAT Case average adjusted sequencing depth ratio cutoff [default: 0.55] │
│ --edit FLOAT Allele similarity Levenshtein edit distance ratio cutoff [default: 0.9] │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Usage: strdrop build [OPTIONS] REFERENCE_FILE
STRdrop: Build reference json from sequencing coverage in STR VCFs
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * reference_file PATH Output reference archive [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --training-set PATH Input directory with reference data [required] │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
You can test the functionality with one of the test files with relatively low coverage:
uv run strdrop call --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf
To avoid spurious calls on X for XY karyotype, pass the --xy <sample_name> option to lower the coverage ratio expectation with a 0.5 shift.
uv run strdrop call --xy GV-42 --fraction 0.55 --training-set test-data test-data/GV-42.trgt.vcf GV-42.strdrop.vcf
Note the tags in the output VCF.
➜ strdrop git:(main) ✗ grep HTT GV-42.strdrop.vcf
chr4 3074876 . CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT,CCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCCT . LowDepth TRID=HD_HTT;END=3074969;MOTIFS=CAG,CCG;STRUC=<TR>;STRDROP_P=0;STRDROP_EDR=0.967033;STRDROP_SDR=0.470972;STRDROP GT:AL:ALLR:SD:MC:MS:AP:AM 1/2:84,87:83-84,87-93:25,32:17_8,18_8:0(0-48)_0(51-54)_1(54-57)_1(60-81),0(0-51)_0(54-57)_1(57-60)_1(63-84):0.964286,0.965517:0.02,0.01```
➜ strdrop git:(main) ✗ grep Strdrop GV-42.strdrop.vcf |head -9
##INFO=<ID=STRDROP_P,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##INFO=<ID=STRDROP_EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##INFO=<ID=STRDROP_SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##INFO=<ID=STRDROP,Number=0,Type=Flag,Description="Strdrop coverage drop detected">
##FILTER=<ID=LowDepth,Description="Strdrop coverage drop detected">
##FORMAT=<ID=SDP,Number=1,Type=Float,Description="Strdrop coverage sequencing depth level probability">
##FORMAT=<ID=EDR,Number=1,Type=Float,Description="Strdrop allele similarity Levenshtein edit distance ratio">
##FORMAT=<ID=SDR,Number=1,Type=Float,Description="Strdrop case average adjusted sequencing depth ratio">
##FORMAT=<ID=DROP,Number=1,Type=String,Description="Strdrop coverage drop detected, 1 for LowDepth">
Output files can be written as compressed VCF or BCF by simply giving an appropriate outfile name suffix, thanks to CyVCF2.
strdrop call --training-set /home/daniel.nilsson/proj/strdrop/reference/ mycase.trgt.vcf.gz mycase.trgt.strdrop.vcf.gz
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strdrop-0.3.2.tar.gz.
File metadata
- Download URL: strdrop-0.3.2.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e606f397306a2cf5d8f3cda7f643f566fc0907bdde33fd266bf17dd5479e50dd
|
|
| MD5 |
75de9ae03398907a32f33fc05e07fd46
|
|
| BLAKE2b-256 |
96b7afd219f2c212f0d35bd0039dc10c769cb6975fd43f5c28d709f7594bcf82
|
File details
Details for the file strdrop-0.3.2-py3-none-any.whl.
File metadata
- Download URL: strdrop-0.3.2-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93497d84fe3e0f7f296798084a82a294dbd554d7b54263d1e12867929ab7e683
|
|
| MD5 |
826aacbc82ca89df2f2d18cccfee6069
|
|
| BLAKE2b-256 |
36abdad05e38d346dcf85d112dd8a3d8637923bf69b8c356fb6de53b2a5962a0
|