Calculate statistics for Oxford Nanopore sequencing data and alignments
Project description
Calculate various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format.
INSTALLATION
pip install nanostat
or
conda install -c bioconda nanostat
USAGE
NanoStat [-h] [-v] [-o OUTDIR] [-p PREFIX] [-n NAME] [-t N] [--barcoded] [--readtype {1D,2D,1D2}] (--fastq file [file ...] | --fasta file [file ...] | --summary file [file ...] | --bam file [file ...]) Calculate statistics of long read sequencing dataset. General options: -h, --help show the help and exit -v, --version Print version and exit. -o, --outdir OUTDIR Specify directory in which output has to be created. -p, --prefix PREFIX Specify an optional prefix to be used for the output file. -n, --name NAME Specify a filename/path for the output, stdout is the default. -t, --threads N Set the allowed number of threads to be used by the script. Input options.: --barcoded Use if you want to split the summary file by barcode --readtype {1D,2D,1D2} Which read type to extract information about from summary. Options are 1D, 2D, 1D2 Input data sources, one of these is required.: --fastq file [file ...] Data is in one or more (compressed) fastq file(s). --fasta file [file ...] Data is in one or more (compressed) fasta file(s). --summary file [file ...] Data is in one or more (compressed) summary file(s)generated by albacore. --bam file [file ...] Data is in one or more sorted bam file(s). EXAMPLES: NanoStat --fastq reads.fastq.gz --outdir statreports NanoStat --summary sequencing_summary1.txt sequencing_summary2.txtsequencing_summary3.txt --readtype 1D2 NanoStat --bam alignment.bam alignment2.bam
EXAMPLES
NanoStat --fastq reads.fastq.gz --outdir statreports NanoStat --summary sequencing_summary1.txt sequencing_summary2.txt sequencing_summary3.txt --readtype 1D2 NanoStat --bam alignment.bam alignment2.bam
Example output
General summary: Number of reads: 3995 Total bases: 11418359 Median read length: 1221.0 Mean read length: 2858.2 Read length N50: 8676 Active channels: 933 Mean read quality: 10.2 Median read quality: 10.6 Top 5 longest reads and their mean basecall quality score 1: 36928 (10.8, [a9dbd2b5-718c-4d0c-afa8-a12a54a5a12a]) 2: 32830 (10.2, [b87fc717-1cf8-4526-9f96-3042fda5b769]) 3: 30474 (12.4, [ea3e43d8-6cbf-4687-95bd-66e6123512d4]) 4: 27531 (12.5, [74c0e08c-eb94-4825-b93b-21d63e05cf14]) 5: 26535 (10.4, [8e6ed505-8477-4462-9f0a-3a72783cbf60]) Top 5 highest mean basecall quality scores and their read lengths 1: 14.8 (1040, [acf6f90b-ea22-4960-8049-6e6e694a3f9a]) 2: 14.7 (9603, [ec796da1-5c4a-4350-974b-6dabb8deb546]) 3: 14.6 (680, [792c485a-81cb-4ef7-8f23-01f10f9c7c23]) 4: 14.5 (2664, [d8092ffb-9919-42fb-ad41-34b1658f1bd5]) 5: 14.5 (909, [d55d3bf6-0729-4b46-82cd-0cef00bcf849]) Number and percentage of reads above quality cutoffs >Q5: 3559 (89.1%) >Q7: 3429 (85.8%) >Q10: 2705 (67.7%) >Q12: 1072 (26.8%) >Q15: 0 (0.0%)
I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
NanoStat-1.1.0.tar.gz
(5.4 kB
view hashes)