Calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file.
Project description
assembly_stats
A Python library that takes a FASTA file as input and calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file. It does this by breaking each scaffold wherever there is more than one N and then calculating statistics for both the scaffolds and contigs.
This is a re-write of fasta_metadata_parser to speed up the old implementation, and -- most importantly -- to learn how to install Python scripts onto the Smithsonian HPC.
Installation
pip install assembly_stats
Usage
$ assembly_stats -h
usage: assembly_stats [-h] filename
Calculate statistics about genome assemblies.
positional arguments:
filename Genome file in FASTA format.
optional arguments:
-h, --help show this help message and exit
After calculating the statistics for the genome assembly, they will be printed out in JSON format.
Next steps
- Add ability to save NumPy sequence length arrays for further visualization, since generating these are what takes the most time.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for assembly_stats-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77ad05de78ce4a37c69db6e178fec7a11ffc1c3fc3a2a94072e64e11d6e5f0f7 |
|
MD5 | 4a16bc0117698de2d07e5dc1383ab70e |
|
BLAKE2b-256 | d7dd7025ead6f8cf24ad0580af1193bcfd7176f7393df207f82be3ffbaed55e0 |