Skip to main content

Fast sequence statistics for FASTA/FASTQ files — N50, GC%, length distributions and more

Project description

seqstats

CI PyPI Python License

Fast sequence statistics for FASTA and FASTQ files — works on plain or gzipped inputs, no dependencies.

file                            seqs      total_bp      gc%      mean_len    min_len   max_len   N50         N90
-----------------------------------------------------------------------------------------------------------------
GRCh38.primary_assembly.fa      194       3,088,286,401  40.93   15,918,992  970       248,956,422  153,373,213  40,103,529
SRR10045678_1.fastq.gz          10000000  1,510,000,000  50.21   151.0       151       151          151          151

Install

pip install seqstatx

Or for development:

git clone https://github.com/perhapsstrawberries/seqstats.git
cd seqstats
pip install -e .

Usage

# single file
seqstatx genome.fa

# multiple files, gzipped FASTQ
seqstatx sample1.fastq.gz sample2.fastq.gz

# TSV output for downstream parsing
seqstatx --tsv *.fa > stats.tsv

# pipe to column for alignment
seqstatx --tsv *.fastq.gz | column -t

Metrics

Column Description
seqs Number of sequences / reads
total_bp Total base pairs
gc% GC content (%)
mean_len Mean sequence length
min_len / max_len Shortest / longest sequence
N50 50% of total assembly is in sequences ≥ this length
N90 90% of total assembly is in sequences ≥ this length

Supported formats

Extension Format
.fa .fna .fasta FASTA
.fq .fastq FASTQ
.fa.gz .fastq.gz etc. gzipped variants

Why

Existing tools (seqkit, seqtk) are great but require installation of compiled binaries.
seqstats is pure Python 3.10+, zero dependencies, pip-installable from any HPC or Conda environment.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqstatx-0.1.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seqstatx-0.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file seqstatx-0.1.0.tar.gz.

File metadata

  • Download URL: seqstatx-0.1.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for seqstatx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2c81b05109cd1666f6a47552cb7c8801099a0d4bd2c7a4435eae0d21fc422fd
MD5 17368df077e796fa57a7403699d79c06
BLAKE2b-256 23d5f94ba5a7eca0489520f72cb4ad181b10b985053abc1aa2ed32f8879eb58d

See more details on using hashes here.

File details

Details for the file seqstatx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: seqstatx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for seqstatx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45d5ca36a8de4794cc1a78e6bdfeff88e9f71d70bfbec5778e1b7c2de2abb198
MD5 034ff98106cf6c3c961876ed36fd7d3a
BLAKE2b-256 092e2c9c85909b810aa800075ae7acb4f3b453108a6c995c991c2385cffac5bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page