Skip to main content

This is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. Pure rust version of seqtk.

Project description

seqtk-rs

crate

This is a sequence processing tool written in Rust for manipulating FASTA/FASTQ files. I built this tool out of my passion for Rust. Its functionality and subcommand names are similar to those in seqtk, but I’ve made some changes based on my own design logic.

Installation

cargo install seqtk-rs
seqtk_rs -h

Current Features

  • seq Common transformation of FASTA/Q

  • sample Random Sampling by given seed and fraction

  • size Report the stats of sequence length

    (Output: #seq, #bases, avg_size, min_size, med_size, max_size, N50)

  • fqchk Report stats for sequence and quality by position

    (Output: POS, #bases, %A, %C, %G, %T, %N, avgQ, errQ, ...)

    • avgQ: Average quality score (Q₁ + Q₂ + ... + Qₙ) / N
    • errQ: Estimated error rate -10 * log₁₀((P₁ + P₂ + ... + Pₙ) / N)

    Notice: Some tools treat quality scores less than 3 (Q < 3) as 3 to avoid instability in downstream metrics. For example, Q = 0 yields an error probability P = 1.0, Q = 1 gives P ≈ 0.794, and Q = 2 gives P ≈ 0.630. These low Q-scores can heavily skew error rate calculations (e.g., errQ), which is why they are often floored to 3. However, this adjustment can lead to results that are inconsistent with the original definition. Therefore, this tool preserves the original quality scores as-is.

  • comp Report the nucleotide composition of FASTA/Q

    (Output: #A, #C, #G, #T, #2, #3, #4, #CG, #GC)

    • CG or GC: Number of CG/GC on the template strand
  • qctrim Trims low-quality bases from a FASTQ data based on a quality threshold Q.

TODO

  • trimAdapter trim the adapter for FASTQ file

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seqtk_rs-0.2.0-py3-none-manylinux_2_34_x86_64.whl (613.1 kB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

File details

Details for the file seqtk_rs-0.2.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for seqtk_rs-0.2.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 06c81231bf33f4bd4c568500773b8e6ce89e58b382cee2c8e8f93be7a2c7e13c
MD5 9905cc028e679279bd612827b7de96ce
BLAKE2b-256 7d06d444869bcc25019e3b959112472001609d56a9821bf9f060e0f1d047c812

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page