Skip to main content

A command-line tool for processing and analyzing biological sequences.

Project description

seqx

seqx is an agent-friendly CLI for FASTA/FASTQ sequence processing.

It is designed around streaming I/O, predictable command behavior, and low-memory execution for large files.

Installation

pypi

pip install seqx

cargo

cargo install seqx

prebuilt binaries

Prebuilt binaries for Linux and macOS are available on the releases page

Quick Start

# Show help
seqx --help

# Show guide (agent-friendly help)
seqx guide
seqx guide filter

# Basic stats
seqx stats -i input.fa

# Convert FASTA -> FASTQ
seqx convert -i input.fa -T fastq -o output.fq

# Filter short sequences
seqx filter -i input.fa --min-len 100 -o filtered.fa

Commands

stats

seqx stats -i input.fa
seqx stats -i input.fa --gc
seqx stats -i input.fq --qual --min-len 50

convert

seqx convert -i input.fa -T fastq -Q 30 -o output.fq
seqx convert -i input.fq -T fasta -o output.fa

filter

seqx filter -i input.fa --min-len 100 --max-len 2000
seqx filter -i input.fa --pattern "ATG.*TAA"
seqx filter -i input.fa --exclude-pattern "N{10,}"
seqx filter -i input.fa --id-file ids.txt
seqx filter -i input.fq --min-qual 30

extract

seqx extract -i input.fa --id seq1
seqx extract -i input.fa --id-file ids.txt
seqx extract -i input.fa --range 1:100
seqx extract -i input.fa --bed regions.bed -F 20

search

seqx search -i input.fa "ATG"
seqx search -i input.fa "ATG.*TAA" --regex
seqx search -i input.fa "ATG" --mismatches 1 --threads 8
seqx search -i input.fa "ATG" --bed --strand

modify

seqx modify -i input.fa --upper
seqx modify -i input.fa --lower
seqx modify -i input.fa --slice 10:200
seqx modify -i input.fa --remove-gaps
seqx modify -i input.fa --reverse-complement

sample

seqx sample -i input.fa --count 1000 --seed 42
seqx sample -i input.fa --fraction 0.1

sort

seqx sort -i input.fa --by-name
seqx sort -i input.fa --by-len --desc
seqx sort -i input.fa --by-gc --max-memory 256 --threads 8

dedup

seqx dedup -i input.fa
seqx dedup -i input.fa --by-id
seqx dedup -i input.fa --prefix 12 --ignore-case
seqx dedup -i input.fa --buckets 256 --threads 8

merge

seqx merge a.fa b.fa c.fa -o merged.fa
seqx merge a.fa b.fa c.fa --add-prefix --sep ":" -o merged_with_source.fa

split

seqx split -i input.fa --parts 10 -o out_dir
seqx split -i input.fa --chunk-size 1000 -o out_dir
seqx split -i input.fa --by-id -o out_dir --prefix seq

compress

# Compress using pigz if available, otherwise built-in
seqx compress -i input.fa
seqx compress -i input.fa -o output.fa.gz -l 9

# Decompress
seqx compress -d -i input.fa.gz
seqx compress -d -i input.fa.gz -o output.fa

# Use stdin/stdout
cat input.fa | seqx compress > output.fa.gz
cat input.fa.gz | seqx compress -d > output.fa

# Force built-in implementation
seqx compress -i input.fa --no-pigz

guide

# List all commands
seqx guide

# Show detailed help for a specific command
seqx guide filter
seqx guide compress

# Output in JSON format (for programmatic use)
seqx guide --format json
seqx guide filter --format json

# Output in Markdown format
seqx guide --format markdown

Behavior Notes

  • Input defaults to stdin where supported.
  • Output defaults to stdout where supported.
  • Format detection is extension-based (.fa/.fasta/.fq/.fastq, optional .gz).
  • FASTA/FASTQ parsing uses noodles.
  • extract currently supports FASTA extraction only.

Nucleotide vs Protein Behavior

  • Protein FASTA records are supported by all commands.
  • Nucleotide-only operations are explicitly guarded:
    • filter --gc-min/--gc-max
    • modify --reverse-complement
    • reverse-complement matching in search (enabled only when both record and pattern are nucleotide)

Performance Model

  • sort: external chunk sort + mmap merge, configurable with --max-memory and --threads.
  • dedup: disk bucket partitioning + per-bucket dedup + stable merge, configurable with --buckets and --threads.
  • split --parts: two-pass streaming split (stdin may be materialized to a temp file).
  • compress: uses pigz if available, otherwise uses gzp (parallel gzip in Rust) with automatic thread detection.
  • Temp binary record paths use packed_seq_io (2-bit packing for A/C/G/T when applicable).

Bench Script

./scripts/bench_packed_io.sh

# Custom workload
N_RECORDS=1000000 SEQ_LEN=200 DUP_RATE=40 ./scripts/bench_packed_io.sh

Developer Docs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqx-0.1.3-py3-none-win_amd64.whl (1.5 MB view details)

Uploaded Python 3Windows x86-64

seqx-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

seqx-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

seqx-0.1.3-py3-none-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

seqx-0.1.3-py3-none-macosx_10_12_x86_64.whl (1.5 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file seqx-0.1.3-py3-none-win_amd64.whl.

File metadata

  • Download URL: seqx-0.1.3-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for seqx-0.1.3-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 07067ead0537b2466d3c67ab293aa234976acfc4bef08112f2f544cff0ab228b
MD5 4df30552d040e6e97b9fa3e6109e31a4
BLAKE2b-256 78a6a69ca37eb323135258031a97881c321c465aac130e0f67abb1853905efd7

See more details on using hashes here.

File details

Details for the file seqx-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqx-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f40c94d86e7ab6378b957199d302dda1ce0b75fabc5c3bb6c0fe1b57da48d177
MD5 8cd4c2df6933ee44e2f61fc951ec4c1d
BLAKE2b-256 8f4317fcf7a34d68d026babfc865f770907e83a824a7cf38b8e03ac0c3667bc5

See more details on using hashes here.

File details

Details for the file seqx-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for seqx-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ef99030a626b402e273ba0ad00c80f29ced1fffc5fe03a41e8389554dc797057
MD5 9b0636659d93992276355da1268e1b5e
BLAKE2b-256 ba7cc41dbee15c7de97678c8f0bad37f32a80a2e189b86abdc35da0877a4f755

See more details on using hashes here.

File details

Details for the file seqx-0.1.3-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqx-0.1.3-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 80431e529b63a50b153165bfd83ad9a244f85920b39c82b62ca6a1459604c21b
MD5 4dd189ffb4ef7e4e57ed7b972c713657
BLAKE2b-256 c7978c6c6a56885061b97988b55d30dd2fa0af6e6bdbb00de9fe7b911e2432a8

See more details on using hashes here.

File details

Details for the file seqx-0.1.3-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for seqx-0.1.3-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2f8563a561c091ff0f83f7edd280c2eebd3c964fffa8b0652a9b77d0bff3c993
MD5 3cd2506982735e90306c390e562d1a08
BLAKE2b-256 9659afcec65c4198f1afbf16e65b5b9857e468438992f3393ee1c8fade9cda2d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page