FastSTR: A high-performance tool for short tandem repeat (STR) detection and analysis from genome assemblies.

These details have not been verified by PyPI

Project links

Project description

🧬 FastSTR

FastSTR — Ultra-fast and accurate identification of Short Tandem Repeats (STRs) from long-read DNA sequences. Developed for genome-wide STR detection, consensus construction, and comparative STR analysis.

🌍 Overview

FastSTR is a novel and efficient tool for de novo detection of short tandem repeats (STRs) in genomic sequences. It combines fast motif recognition with accurate sequence alignment to achieve both high precision and completeness in STR identification. FastSTR is optimized for large-scale genomic datasets and enables rapid detection of repetitive elements without relying on predefined motif libraries or fixed repeat-length thresholds.

Compared to classical tools like TRF, T-reks, and TRASH, FastSTR achieves:

⚡ High-speed parallel processing — Processes genomic fragments in parallel, achieving up to 10× faster runtime.
🧠 Context-aware motif recognition — Uses an N-gram + Markov model to identify representative motifs without predefined motif libraries.
🧩 Segmented global alignment — Efficiently handles ultra-long or complex STRs while maintaining base-level precision.
🔍 Smart interval merging — Applies an interval-gain decision strategy to accurately resolve overlapping STRs.
🧬 Enhanced detection in complex regions — Identifies confounding or nested repeat regions (e.g., centromeric satellites) with a novel density-based concentration test.
💾 Lightweight & scalable — Requires few dependencies, easy to install and run, and supports multiple operating systems.

⚙️ Installation

Option 1: Install via `pip`

pip install faststr

Option 2: Install via `conda`

(coming soon)

conda install -c bioconda faststr

Option 3: Local installation (development)

git clone https://github.com/yourname/faststr.git
cd faststr
pip install -e .

🚀 Quick Start

Basic Command

faststr [--strict | --normal | --loose] [--default] genome.fa

Example

faststr --strict --default genome.fa

This runs FastSTR in strict mode using the default model to identify STRs in the genome.fa file.

⚡ Command Line Options

Argument	Type	Default	Description
`match`	int	2	Match score
`mismatch`	int	5	Mismatch score
`gap_open`	int	7	Gap opening penalty
`gap_extend`	int	3	Gap extension penalty
`p_indel`	int	15	Indel percentage threshold
`p_match`	int	80	Match percentage threshold
`score`	int	50	Alignment score threshold
`quality_control`	bool	False	Enable read-level quality control
`DNA_file`	str	—	Path to DNA FASTA input
`-f`	str	—	Output directory
`-s`	int	1	Start index
`-e`	int	0	End index
`-l`	int	15000	Sub-read length
`-o`	int	1000	Overlap length
`-p`	int	1	Number of CPU cores
`-b`	float	0.045	Motif coverage threshold

🧠 Alignment Modes

Mode	Description
`--strict`	High precision, recommended for curated assemblies
`--normal`	Balanced mode, suitable for most datasets
`--loose`	High sensitivity, tolerant of mismatches

🧬 Model Presets

Preset	Description
`--default`	Standard scoring model
(future) `--sensitive`	Optimized for noisy long reads
(future) `--speed`	Optimized for large-scale detection

📥 Input & Output

Input

DNA sequences in FASTA format

Output

File Pattern	Description
`*detail.dat`	Contains all STR positions and motifs, quality statistics for each STR, and STR counts per chromosome.
`*align.dat`	Detailed alignment of all STRs against reference STRs, including mismatches and indels.
`*.csv`	Merged STR intervals with representative motifs and summary statistics for each interval.
`*.log`	Processing logs.

🧪 Usage

1️⃣ Identify STRs in a genome

faststr --normal --default human_genome.fa

2️⃣ Use multiple cores

faststr --strict --default genome.fa -p 8

📈 Performance

Dataset	Genome Size	Tool	Runtime	Recall	Precision
Human (T2T)	2.94 G	TRF	18 h 31 min	-	-
		FastSTR	1 h 13 min	0.950	0.994
Mouse (GRCm39)	2.57 G	TRF	1 h 41 min	-	-
		FastSTR	38 min	0.966	0.997
Zebrafish (GRCz11)	1.58 G	TRF	2 h 51 min	-	-
		FastSTR	25 min	0.945	0.998

Note: TRF is used as the ground-truth. FastSTR runs based on 72 CPUs.

📚 Citation

If you use FastSTR in your research, please cite:

Xingyu Liao et al.,
Efficient Identification of Short Tandem Repeats via Context-Aware Motif Discovery and Ultra-Fast Sequence Alignment,
Nat. Methods, 2025.

📄 License

This project is licensed under the MIT License.
See LICENSE for more details.

🧾 Changelog

v1.0.0 (2025)

Initial release of FastSTR
Supports three alignment modes and one default model
Implemented parallel computation
Added .csv, .dat, .log outputs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jan 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faststr-1.0.0.tar.gz (27.7 kB view details)

Uploaded Jan 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

faststr-1.0.0-py3-none-any.whl (29.6 kB view details)

Uploaded Jan 21, 2026 Python 3

File details

Details for the file faststr-1.0.0.tar.gz.

File metadata

Download URL: faststr-1.0.0.tar.gz
Upload date: Jan 21, 2026
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for faststr-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f985076b0c44b85e2b6bf076c29d0e7e275a81c019ff25f63a71022b00c23d1e`
MD5	`193d6e9e115d36390e204f62c11fa075`
BLAKE2b-256	`ff5543a38a82922eba3c416002b607bed9a31cb06a13fc5eddef2604f91e2eb7`

See more details on using hashes here.

File details

Details for the file faststr-1.0.0-py3-none-any.whl.

File metadata

Download URL: faststr-1.0.0-py3-none-any.whl
Upload date: Jan 21, 2026
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for faststr-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38e828942579483172429001ef7e3b164f22d70f57de4ba34af62fecd3f743cd`
MD5	`41f0acf692a973bedb6a04b5fc844f99`
BLAKE2b-256	`3ab52f1e5b5316c031acf02bf077cb7f1c41172d03988d7241e4e9cd81b4ea87`

See more details on using hashes here.

FastSTR 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧬 FastSTR

📘 Table of Contents

🌍 Overview

⚙️ Installation

Option 1: Install via pip

Option 2: Install via conda

Option 3: Local installation (development)

🚀 Quick Start

Basic Command

Example

⚡ Command Line Options

🧠 Alignment Modes

🧬 Model Presets

📥 Input & Output

Input

Output

🧪 Usage

1️⃣ Identify STRs in a genome

2️⃣ Use multiple cores

📈 Performance

📚 Citation

📄 License

🧾 Changelog

v1.0.0 (2025)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Option 1: Install via `pip`

Option 2: Install via `conda`