Skip to main content

A script to evaluate the assembly of a given genome.

Project description

assemblyStatistics

Python package License: GPL v3 Made With Love

A script to evaluate the assembly of a given genome.

💙 If you like this project, give it a ⭐ and share it with friends!

It provides various statistics regarding a Fasta file containing multiple sequences, such as sequence name, N50, N90, GC Content, N rate, etc., both large scaffolds (>1000 bp) and contigs across all sequences.

一个用于评估给定基因组的拼接结果的脚本,用于查看拼接结果的Scaffold, Contig统计信息。它提供了有关包含多个序列的Fasta文件的各种统计信息,如序列名称、N50、N90、GC含量、N比率、序列数量等。

⚙ USAGE

assemblyStatistics -f test.fasta -l 100
assemblyStatistics test.fasta

⚡ Quick install

pip install assemblyStatistics

🔧 Options

run assemblyStatistics -h or assemblyStatistics --help for options

Usage: assemblyStatistics [options] -f INPUT.fasta

A script to evaluate the assembly of a given fasta.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -f FILE, --fasta=FILE
                        input fasta file
  -t FORMAT, --outfmt=FORMAT
                        format of the output, choices = [txt, json]
  -l LENGTH, -L LENGTH, --length=LENGTH
                        Threshold of length defined a sequence as Large
                        [default = 1000]

Sample output

Default txt output(human readable text format)

All scaffold sequences summary:
--------------------------------------------------
Counts of scaffold sequences            3                                       
Length of scaffold sequences            490                                     
Largest scaffold name                   seq1                                    
Largest scaffold length                 210                                     
Scaffold N50                            210                                     
Counts of N50                           2                                       
Scaffold N90                            70                                      
Counts of N90                           3                                       
GC content(%)                           42.86                                   
N Length                                141                                     
N content (%)                           28.78                                   

LARGE (> 100 bp) sequences summary:
--------------------------------------------------
Counts of LARGE sequences               2                                       
Length of LARGE sequences               420                                     
LARGE scaffold N50                      210                                     
Counts of LARGE N50                     1                                       
LARGE scaffold N90                      210                                     
Counts of LARGE N90                     2                                       
GC content(%)                           33.33                                   
N Length                                141                                     
N content (%)                           33.57                                   

contigs summary:
--------------------------------------------------
Counts of contigs                       7                                       
Maximum length of contigs               70                                      
contig N50                              70                                      
Counts of contig N50                    3                                       
contig N90                              36                                      
Counts of contig N90                    5     

JSON format output with command assemblyStatistics -f sample.fasta -t json

{
    "All scaffolds": {
        "Counts of scaffold sequences": 4,
        "Length of scaffold sequences": 605,
        "Largest scaffold name": "seq2",
        "Largest scaffold length": 271,
        "Scaffold N50": 269,
        "Counts of N50": 2,
        "Scaffold N90": 45,
        "Counts of N90": 3,
        "GC content(%)": 49.09090909090909,
        "N Length": 59,
        "N content (%)": 9.75206611570248
    },
    "LARGE sequences": {
        "threshold": 1000,
        "Counts of LARGE sequences": 0,
        "Length of LARGE sequences": 0
    },
    "Contigs": {
        "Counts of contigs": 8,
        "Maximum length": 225,
        "contig N50": 121,
        "Counts of contig N50": 2,
        "contig N90": 21,
        "Counts of contig N90": 6
    }
}

Feedback/Issues

Please report any issues to the issues page or email linwenchao@yeah.net

Todo

  • Get statistics from a list of files
  • Compressed format (.gz, .bz2 or .xz) support
  • Multiple output format support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assemblyStatistics-1.1.3.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

assemblyStatistics-1.1.3-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file assemblyStatistics-1.1.3.tar.gz.

File metadata

  • Download URL: assemblyStatistics-1.1.3.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for assemblyStatistics-1.1.3.tar.gz
Algorithm Hash digest
SHA256 be01284b6c822737ce4bdbf7e2c7e1cc702447763a1b2dfd9cb586e7f7be6ebb
MD5 6591006e7d70a4e71fc22847cbf09b37
BLAKE2b-256 f8a3c8cf0ae0d36c81917247f41fdc79ef5a4a5c34d52044e2ae1f1b00e6e6eb

See more details on using hashes here.

File details

Details for the file assemblyStatistics-1.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for assemblyStatistics-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 375a08411f9b2792e66b5a39c40d6a89d9468bb74726d25095d946a306c271c7
MD5 3fdcbcd837931585d6b2e24396474b95
BLAKE2b-256 307c7f3904da4fee3721636932c033bb215d63d6c03ca0dd30d2e8d08edf28cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page