A command-line tool for processing and analyzing biological sequences.
Project description
seqx
seqx is an agent-friendly CLI for FASTA/FASTQ sequence processing.
It is designed around streaming I/O, predictable command behavior, and low-memory execution for large files.
Repository Layout
seqx/
├── .github/
│ └── workflows/
│ └── release.yml
├── scripts/
│ ├── bench_packed_io.sh
│ └── gen_random_fasta.py
├── src/
│ ├── main.rs
│ ├── lib.rs
│ ├── cmd/
│ │ ├── mod.rs
│ │ ├── compress.rs
│ │ ├── convert.rs
│ │ ├── dedup.rs
│ │ ├── extract.rs
│ │ ├── filter.rs
│ │ ├── merge.rs
│ │ ├── modify.rs
│ │ ├── sample.rs
│ │ ├── search.rs
│ │ ├── sort.rs
│ │ ├── split.rs
│ │ ├── stats.rs
│ │ └── guide.rs
│ └── common/
│ ├── mod.rs
│ ├── parser.rs
│ ├── packed_seq_io.rs
│ ├── record.rs
│ ├── writer.rs
│ └── README.md
├── Cargo.toml
├── Cargo.lock
├── README.md
├── QUICKREF.md
├── DEVELOPMENT.md
├── SKILL.md
├── rustfmt.toml
└── target/ # build artifacts (generated)
Build
cargo build --release
Binary path:
target/release/seqx
Quick Start
# Show help
seqx --help
# Show guide (agent-friendly help)
seqx guide
seqx guide filter
# Basic stats
seqx stats -i input.fa
# Convert FASTA -> FASTQ
seqx convert -i input.fa -T fastq -o output.fq
# Filter short sequences
seqx filter -i input.fa --min-len 100 -o filtered.fa
Commands
stats
seqx stats -i input.fa
seqx stats -i input.fa --gc
seqx stats -i input.fq --qual --min-len 50
convert
seqx convert -i input.fa -T fastq -Q 30 -o output.fq
seqx convert -i input.fq -T fasta -o output.fa
filter
seqx filter -i input.fa --min-len 100 --max-len 2000
seqx filter -i input.fa --pattern "ATG.*TAA"
seqx filter -i input.fa --exclude-pattern "N{10,}"
seqx filter -i input.fa --id-file ids.txt
seqx filter -i input.fq --min-qual 30
extract
seqx extract -i input.fa --id seq1
seqx extract -i input.fa --id-file ids.txt
seqx extract -i input.fa --range 1:100
seqx extract -i input.fa --bed regions.bed -F 20
search
seqx search -i input.fa "ATG"
seqx search -i input.fa "ATG.*TAA" --regex
seqx search -i input.fa "ATG" --mismatches 1 --threads 8
seqx search -i input.fa "ATG" --bed --strand
modify
seqx modify -i input.fa --upper
seqx modify -i input.fa --lower
seqx modify -i input.fa --slice 10:200
seqx modify -i input.fa --remove-gaps
seqx modify -i input.fa --reverse-complement
sample
seqx sample -i input.fa --count 1000 --seed 42
seqx sample -i input.fa --fraction 0.1
sort
seqx sort -i input.fa --by-name
seqx sort -i input.fa --by-len --desc
seqx sort -i input.fa --by-gc --max-memory 256 --threads 8
dedup
seqx dedup -i input.fa
seqx dedup -i input.fa --by-id
seqx dedup -i input.fa --prefix 12 --ignore-case
seqx dedup -i input.fa --buckets 256 --threads 8
merge
seqx merge a.fa b.fa c.fa -o merged.fa
seqx merge a.fa b.fa c.fa --add-prefix --sep ":" -o merged_with_source.fa
split
seqx split -i input.fa --parts 10 -o out_dir
seqx split -i input.fa --chunk-size 1000 -o out_dir
seqx split -i input.fa --by-id -o out_dir --prefix seq
compress
# Compress using pigz if available, otherwise built-in
seqx compress -i input.fa
seqx compress -i input.fa -o output.fa.gz -l 9
# Decompress
seqx compress -d -i input.fa.gz
seqx compress -d -i input.fa.gz -o output.fa
# Use stdin/stdout
cat input.fa | seqx compress > output.fa.gz
cat input.fa.gz | seqx compress -d > output.fa
# Force built-in implementation
seqx compress -i input.fa --no-pigz
guide
# List all commands
seqx guide
# Show detailed help for a specific command
seqx guide filter
seqx guide compress
# Output in JSON format (for programmatic use)
seqx guide --format json
seqx guide filter --format json
# Output in Markdown format
seqx guide --format markdown
Behavior Notes
- Input defaults to
stdinwhere supported. - Output defaults to
stdoutwhere supported. - Format detection is extension-based (
.fa/.fasta/.fq/.fastq, optional.gz). - FASTA/FASTQ parsing uses
noodles. extractcurrently supports FASTA extraction only.
Nucleotide vs Protein Behavior
- Protein FASTA records are supported by all commands.
- Nucleotide-only operations are explicitly guarded:
filter --gc-min/--gc-maxmodify --reverse-complement- reverse-complement matching in
search(enabled only when both record and pattern are nucleotide)
Performance Model
sort: external chunk sort + mmap merge, configurable with--max-memoryand--threads.dedup: disk bucket partitioning + per-bucket dedup + stable merge, configurable with--bucketsand--threads.split --parts: two-pass streaming split (stdin may be materialized to a temp file).compress: usespigzif available, otherwise usesgzp(parallel gzip in Rust) with automatic thread detection.- Temp binary record paths use
packed_seq_io(2-bit packing for A/C/G/T when applicable).
Bench Script
./scripts/bench_packed_io.sh
# Custom workload
N_RECORDS=1000000 SEQ_LEN=200 DUP_RATE=40 ./scripts/bench_packed_io.sh
Developer Docs
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seqx-0.1.1-py3-none-win_amd64.whl.
File metadata
- Download URL: seqx-0.1.1-py3-none-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dff14e9fdfc52acd2a211239c0c9ac60690d44962fbbdf9ee76710c2daba4862
|
|
| MD5 |
a0c5761b882f4e81c698f873b57335b5
|
|
| BLAKE2b-256 |
ea1e2f5655e1999437ccc4697eaecdd0e7a8cc7c8db8264094a9bf070010c598
|
File details
Details for the file seqx-0.1.1-py3-none-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: seqx-0.1.1-py3-none-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8b7bf0fcceaa9da6afd0adab6298a2be74fd909ce72cd89c6e5cbb65dbcdec8
|
|
| MD5 |
9ede42d73bddc8f35af8debc4710ebfe
|
|
| BLAKE2b-256 |
2c8052872ea5a335c176b0e60ee7d9979ee09f9b21bd8427b304ccb85fb316d3
|
File details
Details for the file seqx-0.1.1-py3-none-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: seqx-0.1.1-py3-none-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fca0a96756063882fd75c376a9413f31b1bbe0d5531fbed120530e8c65440aed
|
|
| MD5 |
20ededbc7095eaebc3ce61943e0e19d0
|
|
| BLAKE2b-256 |
44606119cb6765a791bbed96604795e29f3220be581864528826fdb9bf9e26e2
|
File details
Details for the file seqx-0.1.1-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: seqx-0.1.1-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a9a64eb2bd16f7f4e69044502ecc1c71b88f01528d5201cf24e739fa3f14d28
|
|
| MD5 |
8d254a11026c3caf6b37ff11cf34e791
|
|
| BLAKE2b-256 |
6ca676d422009e0ed98ce3c4c88f6241e634cb958574f55341fbed8f940be139
|
File details
Details for the file seqx-0.1.1-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: seqx-0.1.1-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63778631e59fdde765bdcc5bb12f5853ac654857da70e797ad17165f01e75da1
|
|
| MD5 |
88e6a3c8cad4c1944991de2f86ec3853
|
|
| BLAKE2b-256 |
fea8ea6e5747d872c606c694ca12c4cb9e52cebbd3af52bba29236909c8f2d6d
|