A script to evaluate the assembly of a given genome.
Project description
assemblyStatistics
A script to evaluate the assembly of a given genome.
💙 If you like this project, give it a ⭐ and share it with friends!
It provides various statistics regarding a Fasta file containing multiple sequences, such as sequence name, N50, N90, GC Content, N rate, etc., both large scaffolds (>1000 bp) and contigs across all sequences.
一个用于评估给定基因组的拼接结果的脚本,用于查看拼接结果的Scaffold, Contig统计信息。它提供了有关包含多个序列的Fasta文件的各种统计信息,如序列名称、N50、N90、GC含量、N比率、序列数量等。
⚙ USAGE
assemblyStatistics -f test.fasta -l 100
assemblyStatistics test.fasta
⚡ Quick install
pip install assemblyStatistics
🔧 Options
run assemblyStatistics -h
or assemblyStatistics --help
for options
Usage: assemblyStatistics [options] -f INPUT.fasta
A script to evaluate the assembly of a given fasta.
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-f FILE, --fasta=FILE
input fasta file
-t FORMAT, --outfmt=FORMAT
format of the output, choices = [txt, json]
-l LENGTH, -L LENGTH, --length=LENGTH
Threshold of length defined a sequence as Large
[default = 1000]
Sample output
Default txt output(human readable text format)
All scaffold sequences summary:
--------------------------------------------------
Counts of scaffold sequences 3
Length of scaffold sequences 490
Largest scaffold name seq1
Largest scaffold length 210
Scaffold N50 210
Counts of N50 2
Scaffold N90 70
Counts of N90 3
GC content(%) 42.86
N Length 141
N content (%) 28.78
LARGE (> 100 bp) sequences summary:
--------------------------------------------------
Counts of LARGE sequences 2
Length of LARGE sequences 420
LARGE scaffold N50 210
Counts of LARGE N50 1
LARGE scaffold N90 210
Counts of LARGE N90 2
GC content(%) 33.33
N Length 141
N content (%) 33.57
contigs summary:
--------------------------------------------------
Counts of contigs 7
Maximum length of contigs 70
contig N50 70
Counts of contig N50 3
contig N90 36
Counts of contig N90 5
JSON format output with command assemblyStatistics -f sample.fasta -t json
{
"All scaffolds": {
"Counts of scaffold sequences": 4,
"Length of scaffold sequences": 605,
"Largest scaffold name": "seq2",
"Largest scaffold length": 271,
"Scaffold N50": 269,
"Counts of N50": 2,
"Scaffold N90": 45,
"Counts of N90": 3,
"GC content(%)": 49.09090909090909,
"N Length": 59,
"N content (%)": 9.75206611570248
},
"LARGE sequences": {
"threshold": 1000,
"Counts of LARGE sequences": 0,
"Length of LARGE sequences": 0
},
"Contigs": {
"Counts of contigs": 8,
"Maximum length": 225,
"contig N50": 121,
"Counts of contig N50": 2,
"contig N90": 21,
"Counts of contig N90": 6
}
}
Feedback/Issues
Please report any issues to the issues page or email linwenchao@yeah.net
Todo
- Get statistics from a list of files
- Compressed format (.gz, .bz2 or .xz) support
- Multiple output format support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file assemblyStatistics-1.1.3.tar.gz
.
File metadata
- Download URL: assemblyStatistics-1.1.3.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be01284b6c822737ce4bdbf7e2c7e1cc702447763a1b2dfd9cb586e7f7be6ebb |
|
MD5 | 6591006e7d70a4e71fc22847cbf09b37 |
|
BLAKE2b-256 | f8a3c8cf0ae0d36c81917247f41fdc79ef5a4a5c34d52044e2ae1f1b00e6e6eb |
File details
Details for the file assemblyStatistics-1.1.3-py3-none-any.whl
.
File metadata
- Download URL: assemblyStatistics-1.1.3-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 375a08411f9b2792e66b5a39c40d6a89d9468bb74726d25095d946a306c271c7 |
|
MD5 | 3fdcbcd837931585d6b2e24396474b95 |
|
BLAKE2b-256 | 307c7f3904da4fee3721636932c033bb215d63d6c03ca0dd30d2e8d08edf28cc |