A toolkit for analyzing variation in short(ish) tandem repeats.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

STRkit

A toolkit for analyzing variation in short(ish) tandem repeats.

Warning

Bootstrapping performance may be hindered on systems with OpenMP without additional configuration. See below for details on how to fix this.

Installation

STRkit can be installed from PyPI via pip with the following command:

python -m pip install strkit

Commands

`strkit call`: Genotype caller with bootstrapped confidence intervals

A Gaussian mixture model tandem repeat genotype caller for long read data. STRkit is tuned specifically for high-fidelity long reads, although other long read data should still work.

Features:

Performant, vectorized (thanks to parasail) estimates of repeat counts from high-fidelity long reads and a supplied catalog of TR loci and motifs.
Re-weighting of longer reads, to compensate for their lower likelihood of observation.
- Whole-genome and targeted genotyping modes to adjust this re-weighting.
Parallelized for faster computing on clusters and for ad-hoc fast analysis of single samples.
95% confidence intervals on calls via a user-configurable optional parametric bootstrapping process.

Usage:

strkit call \
  path/to/read/file.bam \  # [REQUIRED] At least one indexed read file (BAM/CRAM)
  --ref path/to/reference.fa.gz \  # [REQUIRED] Indexed FASTA-formatted reference genome
  --loci path/to/loci.bed \  # [REQUIRED] TRF-formatted (or 4-col, with motif as last column) list of loci to genotype
  --min-reads 4 \  # Minimum number of supporting reads needed to make a call
  --min-allele-reads 2 \  # Minimum number of supporting reads needed to call a specific allele size 
  --flank-size 70 \  # Size of the flanking region to use on either side of a region to properly anchor reads

If more than one read file is specified, the reads will be pooled. This can come in handy if you have e.g. multiple flow cells of the same sample split into different BAM files, or the reads are split by chromosome.

If you want to output a full call report, you can use the --json output-file.json argument to specify a path to output a more detailed JSON document to. This document contains 99% CIs, peak labels, and some other information that isn't included in the normal TSV file.

Note on OpenMP Threading

Slow performance can result from running strkit call or strkit re-call on a system with OpenMP, due to a misguided attempt at multithreading under the hood somewhere in Numpy/Scipy (which doesn't work here due to repeated initializations of the Gaussian mixture model.) To fix this, set the following environment variable before running:

export OMP_NUM_THREADS=1

All optional flags:

--min-reads ##: Minimum number of supporting reads needed to make a call. Default: 4
--min-allele-reads ##: Minimum number of supporting reads needed to call a specific allele size. Default: 2
--flank-size ##: Size of the flanking region to use on either side of a region to properly anchor reads. Default: 70
--targeted: Turn on targeted genotyping mode, which re-weights longer reads differently. Use this option if the alignment file contains targeted reads, e.g. from PacBio No-Amp Targeted Sequencing. Default: off
--num-bootstrap ###: Now many bootstrap re-samplings to perform. Default: 100
--sex-chr ??: Sex chromosome configuration. Without this, loci in sex chromosomes will not be genotyped. Can be any configuration of Xs and Ys; only count matters. Default: none
--json [path]: Path to output JSON call data to. JSON call data is more detailed than the stdout TSV output. Default: none
--no-tsv: Suppresses TSV output to stdout. Without --json, no output will be generated, which isn't very helpful. Default: TSV output on

`strkit visualize`: Call visualizer

STRkit bundles a call visualization tool which takes as input a BAM file and a JSON call file from using the --json flag with strkit call.

It starts a web server on your local machine; the visualizations can be interacted with in a web browser.

To use the tool, run the following command:

strkit visualize path/to/my-alignment.bam \ 
  --ref hg38 \  # or hg19
  --json path/to/my-calls.json \
  -i 1  # 1-indexed offset in JSON file for locus of interest. Default is 1 if left out.

This will output something like the following:

 * Serving Flask app 'strkit.viz.server' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://localhost:5011 (Press CTRL+C to quit)
...

You can then go to the URL listed, http://localhost:5011, on your local machine to see the visualization tool:

STRkit browser histogram, showing an expansion in the HTT gene.

igv.js Genome Browser The same expansion, shown in the igv.js browser. Note the insertions on the left-hand side in most reads, and the heterozygous copy number pattern.

To exit the tool, press Ctrl-C in your command line window as mentioned in the start-up instructions.

`strkit re-call`: Genotype re-caller

This command has a similar feature-set as strkit call, but is designed to be used with the output of other long-read STR genotyping methods to refine the genotype estimates when calling from HiFi reads.\

Features:

Support for re-calling output from tandem-genotypes, RepeatHMM, and Straglr
Strand resampling / bias correction (for use with the tandem-genotypes program)
95% confidence intervals on calls via user-configurable bootstrapping

Notes:

--min-allele-reads will affect the confidence intervals given by the bootstrap process, especially in low-coverage loci. This should be set depending on the read technology being used; something like a single PacBio HiFi read generally contains higher-quality information than a single PacBio CLR read, for example.

`strkit mi`: Mendelian inheritance analysis

This tool is currently in development and in a very unfinished state. However, the following features will be in the final release:

Mendelian inheritance % (MI) calculations for many common TR genotyping tools for both long/short reads
Confidence-interval MI calculations for the genotyping tools which report CIs
Reports of loci (potentially of interest) which do not respect MI

Copyright and License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.17.1

Jul 8, 2024

0.17.0

Jul 8, 2024

0.16.0

Jun 17, 2024

0.15.0

Jun 15, 2024

0.15.0a23 pre-release

Jun 13, 2024

0.15.0a22 pre-release

Jun 12, 2024

0.15.0a21 pre-release

Jun 11, 2024

0.15.0a20 pre-release

Jun 11, 2024

0.15.0a19 pre-release

Jun 11, 2024

0.15.0a18 pre-release

Jun 10, 2024

0.15.0a17 pre-release

Jun 10, 2024

0.15.0a16 pre-release

Jun 9, 2024

0.15.0a15 pre-release

Jun 9, 2024

0.15.0a14 pre-release

Jun 7, 2024

0.15.0a13 pre-release

Jun 7, 2024

0.15.0a12 pre-release

Jun 7, 2024

0.15.0a11 pre-release

Jun 6, 2024

0.15.0a10 pre-release

Jun 6, 2024

0.15.0a9 pre-release

Jun 5, 2024

0.15.0a8 pre-release

Jun 5, 2024

0.15.0a7 pre-release

Jun 4, 2024

0.15.0a6 pre-release

Feb 12, 2024

0.15.0a5 pre-release

Feb 12, 2024

0.15.0a4 pre-release

Feb 5, 2024

0.15.0a3 pre-release

Feb 5, 2024

0.15.0a2 pre-release

Feb 4, 2024

0.15.0a1 pre-release

Feb 3, 2024

0.14.0

Jan 31, 2024

0.13.1

Jan 30, 2024

0.13.0

Jan 27, 2024

0.12.0

Jan 26, 2024

0.12.0a1 pre-release

Nov 2, 2023

0.11.8

Nov 2, 2023

0.11.7

Nov 2, 2023

0.11.6

Oct 27, 2023

0.11.5

Oct 27, 2023

0.11.4

Oct 26, 2023

0.11.3

Oct 26, 2023

0.11.2

Oct 26, 2023

0.11.1

Oct 25, 2023

0.11.0

Oct 3, 2023

0.10.0

Sep 29, 2023

0.9.0

Aug 29, 2023

0.8.1

Aug 27, 2023

0.8.0

Aug 24, 2023

0.8.0a27 pre-release

Apr 19, 2023

0.8.0a26 pre-release

Apr 18, 2023

0.8.0a25 pre-release

Apr 18, 2023

0.8.0a24 pre-release

Apr 18, 2023

0.8.0a23 pre-release

Apr 15, 2023

0.8.0a22 pre-release

Apr 14, 2023

0.8.0a21 pre-release

Apr 14, 2023

0.8.0a20 pre-release

Apr 14, 2023

0.8.0a19 pre-release

Apr 13, 2023

0.8.0a18 pre-release

Apr 11, 2023

0.8.0a17 pre-release

Apr 11, 2023

0.8.0a16 pre-release

Apr 10, 2023

0.8.0a15 pre-release

Apr 10, 2023

0.8.0a14 pre-release

Apr 9, 2023

0.8.0a13 pre-release

Apr 8, 2023

0.8.0a12 pre-release

Mar 30, 2023

0.8.0a11 pre-release

Mar 29, 2023

0.8.0a10 pre-release

Mar 29, 2023

0.8.0a9 pre-release

Mar 28, 2023

0.8.0a8 pre-release

Mar 27, 2023

0.8.0a7 pre-release

Mar 27, 2023

0.8.0a6 pre-release

Mar 26, 2023

0.8.0a5 pre-release

Mar 26, 2023

0.8.0a4 pre-release

Mar 26, 2023

0.8.0a3 pre-release

Mar 26, 2023

0.8.0a2 pre-release

Mar 26, 2023

0.8.0a1 pre-release

Mar 24, 2023

0.7.2

Mar 12, 2023

0.7.2a3 pre-release

Mar 12, 2023

0.7.2a2 pre-release

Mar 12, 2023

0.7.2a1 pre-release

Mar 12, 2023

0.7.1

Oct 21, 2022

0.7.0

Oct 21, 2022

0.7.0a15 pre-release

Oct 19, 2022

0.7.0a14 pre-release

Oct 18, 2022

0.7.0a13 pre-release

Oct 18, 2022

0.7.0a12 pre-release

Oct 15, 2022

0.7.0a11 pre-release

Oct 15, 2022

0.7.0a10 pre-release

Oct 15, 2022

0.7.0a9 pre-release

Oct 14, 2022

0.7.0a8 pre-release

Oct 13, 2022

0.7.0a7 pre-release

Oct 13, 2022

0.7.0a6 pre-release

Oct 12, 2022

0.7.0a5 pre-release

Oct 11, 2022

0.7.0a4 pre-release

Oct 7, 2022

0.7.0a3 pre-release

Oct 4, 2022

0.7.0a1 pre-release

Sep 30, 2022

0.6.0

Sep 22, 2022

0.6.0a2 pre-release

Sep 13, 2022

0.6.0a1 pre-release

Sep 12, 2022

0.5.0

Sep 9, 2022

0.5.0b4 pre-release

Sep 8, 2022

0.5.0b3 pre-release

Sep 8, 2022

0.5.0b2 pre-release

Aug 30, 2022

0.5.0b1 pre-release

Aug 30, 2022

0.5.0a7 pre-release

Aug 9, 2022

0.5.0a6 pre-release

Aug 4, 2022

0.5.0a5 pre-release

Aug 3, 2022

0.5.0a2 pre-release

Jul 31, 2022

0.5.0a1 pre-release

Jul 30, 2022

0.4.0

Jul 28, 2022

0.4.0rc3 pre-release

Jul 27, 2022

0.4.0rc2 pre-release

Jul 25, 2022

0.4.0rc1 pre-release

Jul 24, 2022

0.3.0

Jul 22, 2022

0.3.0rc4 pre-release

Jul 21, 2022

0.3.0rc3 pre-release

Jul 21, 2022

0.3.0rc2 pre-release

Jul 21, 2022

This version

0.3.0rc1 pre-release

Jul 20, 2022

0.2.0

Jul 18, 2022

0.2.0rc4 pre-release

Jul 5, 2022

0.2.0rc3 pre-release

Jul 5, 2022

0.2.0rc2 pre-release

Jul 5, 2022

0.2.0rc1 pre-release

Jul 5, 2022

0.1.1

May 18, 2022

0.1.0

May 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strkit-0.3.0rc1.tar.gz (53.8 kB view hashes)

Uploaded Jul 20, 2022 Source

Built Distribution

strkit-0.3.0rc1-py3-none-any.whl (62.0 kB view hashes)

Uploaded Jul 20, 2022 Python 3

Hashes for strkit-0.3.0rc1.tar.gz

Hashes for strkit-0.3.0rc1.tar.gz
Algorithm	Hash digest
SHA256	`66dbb4eaac1bb7430e9f43ea59a34fc431defd504d3195ec9ef8bcefa1f71396`
MD5	`1a92c0540d62f7df65f0394c8625af4c`
BLAKE2b-256	`39681f42421be5d97445e0da6e01aea7c133df22a141067d6f346169ec9ea8e0`

Hashes for strkit-0.3.0rc1-py3-none-any.whl

Hashes for strkit-0.3.0rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e7a870f180c25cc384cabb357bfd104a2ffaa218fa999e306cb24d752505d44d`
MD5	`d2c06968588d6b435608236823542acb`
BLAKE2b-256	`72261ddd3098d4cc598a11cc5a6b0881a7dcb43b5003b8cf5c307dff0158a589`

strkit 0.3.0rc1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

STRkit

Installation

Commands

`strkit call`: Genotype caller with bootstrapped confidence intervals

Features:

Usage:

Note on OpenMP Threading

All optional flags:

`strkit visualize`: Call visualizer

`strkit re-call`: Genotype re-caller

Features:

Notes:

`strkit mi`: Mendelian inheritance analysis

Copyright and License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

strkit 0.3.0rc1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

STRkit

Installation

Commands

strkit call: Genotype caller with bootstrapped confidence intervals

Features:

Usage:

Note on OpenMP Threading

All optional flags:

strkit visualize: Call visualizer

strkit re-call: Genotype re-caller

Features:

Notes:

strkit mi: Mendelian inheritance analysis

Copyright and License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`strkit call`: Genotype caller with bootstrapped confidence intervals

`strkit visualize`: Call visualizer

`strkit re-call`: Genotype re-caller

`strkit mi`: Mendelian inheritance analysis