Skip to main content

Plotting suite for Oxford Nanopore sequencing data and alignments

Project description

Plotting tool for Oxford Nanopore sequencing data and alignments.

Twitter URL conda badge Build Status Code Health

Example plot

Example plot

The example plot above shows a bivariate plot comparing log transformed read length with average basecall Phred quality score. More examples can be found in the gallery on my blog ‘Gigabase Or Gigabyte’.

In addition to various plots also a NanoStats file is created summarizing key features of the dataset.

This script performs data extraction from Oxford Nanopore sequencing data in the following formats:
- fastq files
(can be bgzip, bzip2 or gzip compressed)
- fastq files generated by albacore or MinKNOW containing additional information
(can be bgzip, bzip2 or gzip compressed)
- sorted bam files
- sequencing_summary.txt output table generated by albacore

INSTALLATION

pip install NanoPlot

Upgrade to a newer version using:
pip install NanoPlot --upgrade

or

conda badge
conda install -c bioconda nanoplot

The script is written for python3.

OUTPUT

NanoPlot creates: - a statistical summary - a number of plots - a html summary file

USAGE

NanoPlot [-h] [-v] [-t THREADS] [--verbose] [-o OUTDIR] [-p PREFIX]
                [--maxlength N] [--drop_outliers] [--downsample N]
                [--loglength] [--alength] [--minqual N]
                [--readtype {1D,2D,1D2}] [--barcoded] [-c COLOR]
                [-f {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}]
                [--plots [{kde,hex,dot,pauvre} [{kde,hex,dot,pauvre} ...]]]
                [--listcolors]
                (--fastq file [file ...] | --fastq_rich file [file ...] | --fastq_minimal file [file ...] | --summary file [file ...] | --bam file [file ...])

General options:
  -h, --help            show the help and exit
  -v, --version         Print version and exit.
  -t, --threads THREADS
                        Set the allowed number of threads to be used by the script
  --verbose             Write log messages also to terminal.
  --store               Store the extracted data in a pickle file for future plotting using the --pickle input option
  --raw                 Store the extracted data in tab separated file.
  -o, --outdir OUTDIR   Specify directory in which output has to be created.
  -p, --prefix PREFIX   Specify an optional prefix to be used for the output files.

Input data sources, one of these is required.:
  --fastq file [file ...]
                          Data is in one or more default fastq file(s).
  --fastq_rich file [file ...]
                          Data is in one or more fastq file(s) generated by albacore or MinKNOW with
                          additional information concerning channel and time.
  --fastq_minimal file [file ...]
                          Data is in one or more fastq file(s) generated by albacore or MinKNOW with
                          additional information concerning channel and time. Minimal data is extracted
                          swiftly without elaborate checks.
  --summary file [file ...]
                          Data is in one or more summary file(s) generated by albacore.
  --bam file [file ...]   Data is in one or more sorted bam file(s).
  --cram file [file ...]   Data is in one or more sorted cram file(s).
  --pickle pickle         Data is a pickle file stored earlier using the --store option


Each of these options can take one or multiple files e.g.
  --summary summary1.txt summary2.txt summary3.txt
  --bam bam1.txt bam2.txt

Options for filtering or transforming input prior to plotting:
  --maxlength N         Drop reads longer than length specified.
  --minlength N         Drop reads shorter than length specified.
  --drop_outliers       Drop outlier reads with extreme long length.
  --downsample N        Reduce dataset to N reads by random sampling.
  --loglength           Logarithmic scaling of lengths in plots.
  --alength             Use aligned read lengths rather than sequenced length (bam mode)
  --minqual N           Drop reads with an average quality lower than specified.
  --readtype            Which read type to extract information about from summary. Options are 1D, 2D, 1D2
  --barcoded            Use if you want to split the summary file by barcode

Options for customizing the plots created:
  -c, --color COLOR     Specify a color for the plots, must be a valid matplotlib color
  -f, --format {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}
                        Specify the output format of the plots.
  --plots [{kde,hex,dot,pauvre} [{kde,hex,dot,pauvre} ...]]
                        Specify which bivariate plots have to be made.
  --no-N50              Hide the N50 mark in the read length histogram
  --title TITLE         Add a title to all plots, requires quoting if using spaces
  --listcolors          List the colors which are available for plotting and exit.

EXAMPLE USAGE

Nanoplot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed
NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots hex dot
NanoPlot -t 12 --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000 -o bamplots_downsampled

This script now also provides read length vs mean quality plots in the ‘pauvre’-style from [@conchoecia](https://github.com/conchoecia).

ACKNOWLEDGMENTS

I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.

PLOTS GENERATED

Plot

Fastq

Fastq _ric h

Fastq _min imal

Bam

Summa ry

Optio ns

Style

Histo gram of read lengt h

x

x

x

x

x

N50

Histo gram of (log trans forme d) read lengt h

x

x

x

x

x

N50

Bivar iate plot of lengt h again st base call quali ty

x

x

x

x

log trans forma tion

dot, hex, kde, pauvr e

Heatm ap of reads per chann el

x

x

Cumul ative yield plot

x

x

x

Violi n plot of read lengt h over time

x

x

x

Violi n plot of base call quali ty over time

x

x

Bivar iate plot of align ed read lengt h again st seque nced read lengt h

x

dot, hex, kde

Bivar iate plot of perce nt refer ence ident ity again st read lengt h

x

log trans forma tion

dot, hex, kde

Bivar iate plot of perce nt refer ence ident ity again st base call quali ty

x

dot, hex, kde

Bivar iate plot of mappi ng quali ty again st read lengt h

x

log trans forma tion

dot, hex, kde

Bivar iate plot of mappi ng quali ty again st basec all quali ty

x

dot, hex, kde

COMPANION SCRIPTS

  • NanoComp: comparing multiple runs

  • NanoStat: statistic summary report of reads or alignments

  • NanoFilt: filtering and trimming of reads

  • NanoLyse: removing contaminant reads (e.g. lambda control DNA) from fastq

Project details


Release history Release notifications | RSS feed

This version

1.8.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NanoPlot-1.8.0.tar.gz (12.3 kB view details)

Uploaded Source

File details

Details for the file NanoPlot-1.8.0.tar.gz.

File metadata

  • Download URL: NanoPlot-1.8.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for NanoPlot-1.8.0.tar.gz
Algorithm Hash digest
SHA256 3fff4e8dd032277dbd4e54af20e73f7f8405b1e24d760f5b06c8346872006ede
MD5 92589cddf811e5f908702a2e80ee975b
BLAKE2b-256 a5f3982ea21242910ac7ff492821275edd1f9bfdf4b20b4f7e420c848d292c88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page