Skip to main content

Calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file.

Project description


A Python library that takes a FASTA file as input and calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file. It does this by breaking each scaffold wherever there is more than one N and then calculating statistics for both the scaffolds and contigs.

This is a re-write of fasta_metadata_parser to speed up the old implementation, and -- most importantly -- to learn how to install Python scripts onto the Smithsonian HPC.


pip install assembly_stats


  $ assembly_stats -h

    usage: assembly_stats [-h] filename

    Calculate statistics about genome assemblies.

    positional arguments:
      filename    Genome file in FASTA format.

    optional arguments:
      -h, --help  show this help message and exit

After calculating the statistics for the genome assembly, they will be printed out in JSON format.

Next steps

  • Add ability to save NumPy sequence length arrays for further visualization, since generating these are what takes the most time.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for assembly-stats, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size assembly_stats-0.1.2-py3-none-any.whl (5.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size assembly_stats-0.1.2.tar.gz (3.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page