Skip to main content

A command-line tool for processing and analyzing biological sequences.

Project description

seqx

seqx is an agent-friendly CLI for FASTA/FASTQ sequence processing.

It is designed around streaming I/O, predictable command behavior, and low-memory execution for large files.

Repository Layout

seqx/
├── .github/
│   └── workflows/
│       └── release.yml
├── scripts/
│   ├── bench_packed_io.sh
│   └── gen_random_fasta.py
├── src/
│   ├── main.rs
│   ├── lib.rs
│   ├── cmd/
│   │   ├── mod.rs
│   │   ├── compress.rs
│   │   ├── convert.rs
│   │   ├── dedup.rs
│   │   ├── extract.rs
│   │   ├── filter.rs
│   │   ├── merge.rs
│   │   ├── modify.rs
│   │   ├── sample.rs
│   │   ├── search.rs
│   │   ├── sort.rs
│   │   ├── split.rs
│   │   ├── stats.rs
│   │   └── guide.rs
│   └── common/
│       ├── mod.rs
│       ├── parser.rs
│       ├── packed_seq_io.rs
│       ├── record.rs
│       ├── writer.rs
│       └── README.md
├── Cargo.toml
├── Cargo.lock
├── README.md
├── QUICKREF.md
├── DEVELOPMENT.md
├── SKILL.md
├── rustfmt.toml
└── target/                # build artifacts (generated)

Build

cargo build --release

Binary path:

target/release/seqx

Quick Start

# Show help
seqx --help

# Show guide (agent-friendly help)
seqx guide
seqx guide filter

# Basic stats
seqx stats -i input.fa

# Convert FASTA -> FASTQ
seqx convert -i input.fa -T fastq -o output.fq

# Filter short sequences
seqx filter -i input.fa --min-len 100 -o filtered.fa

Commands

stats

seqx stats -i input.fa
seqx stats -i input.fa --gc
seqx stats -i input.fq --qual --min-len 50

convert

seqx convert -i input.fa -T fastq -Q 30 -o output.fq
seqx convert -i input.fq -T fasta -o output.fa

filter

seqx filter -i input.fa --min-len 100 --max-len 2000
seqx filter -i input.fa --pattern "ATG.*TAA"
seqx filter -i input.fa --exclude-pattern "N{10,}"
seqx filter -i input.fa --id-file ids.txt
seqx filter -i input.fq --min-qual 30

extract

seqx extract -i input.fa --id seq1
seqx extract -i input.fa --id-file ids.txt
seqx extract -i input.fa --range 1:100
seqx extract -i input.fa --bed regions.bed -F 20

search

seqx search -i input.fa "ATG"
seqx search -i input.fa "ATG.*TAA" --regex
seqx search -i input.fa "ATG" --mismatches 1 --threads 8
seqx search -i input.fa "ATG" --bed --strand

modify

seqx modify -i input.fa --upper
seqx modify -i input.fa --lower
seqx modify -i input.fa --slice 10:200
seqx modify -i input.fa --remove-gaps
seqx modify -i input.fa --reverse-complement

sample

seqx sample -i input.fa --count 1000 --seed 42
seqx sample -i input.fa --fraction 0.1

sort

seqx sort -i input.fa --by-name
seqx sort -i input.fa --by-len --desc
seqx sort -i input.fa --by-gc --max-memory 256 --threads 8

dedup

seqx dedup -i input.fa
seqx dedup -i input.fa --by-id
seqx dedup -i input.fa --prefix 12 --ignore-case
seqx dedup -i input.fa --buckets 256 --threads 8

merge

seqx merge a.fa b.fa c.fa -o merged.fa
seqx merge a.fa b.fa c.fa --add-prefix --sep ":" -o merged_with_source.fa

split

seqx split -i input.fa --parts 10 -o out_dir
seqx split -i input.fa --chunk-size 1000 -o out_dir
seqx split -i input.fa --by-id -o out_dir --prefix seq

compress

# Compress using pigz if available, otherwise built-in
seqx compress -i input.fa
seqx compress -i input.fa -o output.fa.gz -l 9

# Decompress
seqx compress -d -i input.fa.gz
seqx compress -d -i input.fa.gz -o output.fa

# Use stdin/stdout
cat input.fa | seqx compress > output.fa.gz
cat input.fa.gz | seqx compress -d > output.fa

# Force built-in implementation
seqx compress -i input.fa --no-pigz

guide

# List all commands
seqx guide

# Show detailed help for a specific command
seqx guide filter
seqx guide compress

# Output in JSON format (for programmatic use)
seqx guide --format json
seqx guide filter --format json

# Output in Markdown format
seqx guide --format markdown

Behavior Notes

  • Input defaults to stdin where supported.
  • Output defaults to stdout where supported.
  • Format detection is extension-based (.fa/.fasta/.fq/.fastq, optional .gz).
  • FASTA/FASTQ parsing uses noodles.
  • extract currently supports FASTA extraction only.

Nucleotide vs Protein Behavior

  • Protein FASTA records are supported by all commands.
  • Nucleotide-only operations are explicitly guarded:
    • filter --gc-min/--gc-max
    • modify --reverse-complement
    • reverse-complement matching in search (enabled only when both record and pattern are nucleotide)

Performance Model

  • sort: external chunk sort + mmap merge, configurable with --max-memory and --threads.
  • dedup: disk bucket partitioning + per-bucket dedup + stable merge, configurable with --buckets and --threads.
  • split --parts: two-pass streaming split (stdin may be materialized to a temp file).
  • compress: uses pigz if available, otherwise uses gzp (parallel gzip in Rust) with automatic thread detection.
  • Temp binary record paths use packed_seq_io (2-bit packing for A/C/G/T when applicable).

Bench Script

./scripts/bench_packed_io.sh

# Custom workload
N_RECORDS=1000000 SEQ_LEN=200 DUP_RATE=40 ./scripts/bench_packed_io.sh

Developer Docs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqx-0.1.1-py3-none-win_amd64.whl (1.5 MB view details)

Uploaded Python 3Windows x86-64

seqx-0.1.1-py3-none-manylinux_2_28_x86_64.whl (1.6 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ x86-64

seqx-0.1.1-py3-none-manylinux_2_28_aarch64.whl (1.5 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64

seqx-0.1.1-py3-none-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

seqx-0.1.1-py3-none-macosx_10_12_x86_64.whl (1.5 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file seqx-0.1.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: seqx-0.1.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for seqx-0.1.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 dff14e9fdfc52acd2a211239c0c9ac60690d44962fbbdf9ee76710c2daba4862
MD5 a0c5761b882f4e81c698f873b57335b5
BLAKE2b-256 ea1e2f5655e1999437ccc4697eaecdd0e7a8cc7c8db8264094a9bf070010c598

See more details on using hashes here.

File details

Details for the file seqx-0.1.1-py3-none-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for seqx-0.1.1-py3-none-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b8b7bf0fcceaa9da6afd0adab6298a2be74fd909ce72cd89c6e5cbb65dbcdec8
MD5 9ede42d73bddc8f35af8debc4710ebfe
BLAKE2b-256 2c8052872ea5a335c176b0e60ee7d9979ee09f9b21bd8427b304ccb85fb316d3

See more details on using hashes here.

File details

Details for the file seqx-0.1.1-py3-none-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for seqx-0.1.1-py3-none-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fca0a96756063882fd75c376a9413f31b1bbe0d5531fbed120530e8c65440aed
MD5 20ededbc7095eaebc3ce61943e0e19d0
BLAKE2b-256 44606119cb6765a791bbed96604795e29f3220be581864528826fdb9bf9e26e2

See more details on using hashes here.

File details

Details for the file seqx-0.1.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqx-0.1.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8a9a64eb2bd16f7f4e69044502ecc1c71b88f01528d5201cf24e739fa3f14d28
MD5 8d254a11026c3caf6b37ff11cf34e791
BLAKE2b-256 6ca676d422009e0ed98ce3c4c88f6241e634cb958574f55341fbed8f940be139

See more details on using hashes here.

File details

Details for the file seqx-0.1.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for seqx-0.1.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 63778631e59fdde765bdcc5bb12f5853ac654857da70e797ad17165f01e75da1
MD5 88e6a3c8cad4c1944991de2f86ec3853
BLAKE2b-256 fea8ea6e5747d872c606c694ca12c4cb9e52cebbd3af52bba29236909c8f2d6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page