Fast sequencing quality metrics
Project description
sequali
Sequence quality metrics
Features:
Low memory footprint, small install size and fast execution times.
Informative graphs that allow for judging the quality of a sequence at a quick glance.
Overrepresentation analysis using 21 bp sequence fragments. Overrepresented sequences are checked against the NCBI univec database.
Estimate duplication rate using a fingerprint subsampling technique which is also used in filesystem duplication estimation.
Checks for 6 illumina adapter sequences and 15 nanopore adapter sequences.
Per tile quality plots for illumina reads.
Channel and other plots for nanopore reads.
FASTQ and unaligned BAM are supported. See “Supported formats”.
Supported formats
FASTQ. Only the Sanger variation with a phred offset of 33 and the error rate calculation of 10 ^ (-phred/10) is supported. All sequencers use this format today.
For sequences called by illumina base callers an additional plot with the per tile quality will be provided.
For sequences called by guppy additional plots for nanopore specific data will be provided.
unaligned BAM. Any alignment flags are currently ignored.
For uBAM data as delivered by dorado additional nanopore plots will be provided.
Installation
Installation via pip is available with:
pip install sequali
Sequali is also distributed via bioconda. It can be installed with:
conda install -c conda-forge -c bioconda sequali
Usage
usage: sequali [-h] [--json JSON] [--html HTML] [--outdir OUTDIR]
[--adapter-file ADAPTER_FILE]
[--overrepresentation-threshold-fraction FRACTION]
[--overrepresentation-min-threshold THRESHOLD]
[--overrepresentation-max-threshold THRESHOLD]
[--overrepresentation-max-unique-fragments N]
[--overrepresentation-fragment-length LENGTH]
[--overrepresentation-sample-every DIVISOR]
[--deduplication-estimate-bits BITS] [-t THREADS] [--version]
INPUT
Create a quality metrics report for sequencing data.
positional arguments:
INPUT Input FASTQ or uBAM file. The format is autodetected
and compressed formats are supported.
options:
-h, --help show this help message and exit
--json JSON JSON output file. default: '<input>.json'.
--html HTML HTML output file. default: '<input>.html'.
--outdir OUTDIR, --dir OUTDIR
Output directory for the report files. default:
current working directory.
--adapter-file ADAPTER_FILE
File with adapters to search for. See default file for
formatting. Default: src/sequali/adapters/adapter_list.tsv.
--overrepresentation-threshold-fraction FRACTION
At what fraction a sequence is determined to be
overrepresented. The threshold is calculated as
fraction times the number of sampled sequences.
Default: 0.0001 (1 in 100,000).
--overrepresentation-min-threshold THRESHOLD
The minimum amount of occurrences for a sequence to be
considered overrepresented, regardless of the bound
set by the threshold fraction. Useful for smaller
files. Default: 100.
--overrepresentation-max-threshold THRESHOLD
The amount of occurrences for a sequence to
beconsidered overrepresented, regardless of the bound
set by the threshold fraction. Useful for very large
files. Default: unlimited.
--overrepresentation-max-unique-fragments N
The maximum amount of unique fragments to store.
Larger amounts increase the sensitivity of finding
overrepresented sequences at the cost of increasing
memory usage. Default: 5,000,000.
--overrepresentation-fragment-length LENGTH
The length of the fragments to sample. The maximum is
31. Default: 21.
--overrepresentation-sample-every DIVISOR
How often a read should be sampled. More samples leads
to better precision, lower speed, and also towards
more bias towards the beginning of the file as the
fragment store gets filled up with more sequences from
the beginning. Default: 1 in 8.
--deduplication-estimate-bits BITS
Determines how many sequences are maximally stored to
estimate the deduplication rate. Maximum stored
sequences: 2 ** bits * 7 // 10. Memory required: 2 **
bits * 24. Default: 21.
-t THREADS, --threads THREADS
Number of threads to use. If greater than one sequali
will use an additional thread for gzip decompression.
--version show program's version number and exit
Acknowledgements
FastQC for its excellent selection of relevant metrics. For this reason these metrics are also gathered by sequali.
The matplotlib team for their excellent work on colormaps. Their work was an inspiration for how to present the data and their RdBu colormap is used to represent quality score data. Check their writings on colormaps for a good introduction.
Wouter de Coster for his excellent post on how to correctly average phred scores.
Marcel Martin for providing very extensive feedback.
License
This project is licensed under the GNU Affero General Public License v3. Mainly to avoid commercial parties from using it without notifying the users that they can run it themselves. If you want to include code from sequali in your open source project, but it is not compatible with the AGPL, please contact me and we can discuss a separate license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sequali-0.4.0-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7204751c517e4af28386822927662b2b2bca1516393ae48ad3ad974f170f4490 |
|
MD5 | ec7ba9c8b318781033ef2f9a40634ae4 |
|
BLAKE2b-256 | bd6e8646b98123d59930c83799f0b15e091f17267974ca26fb6432833db11695 |
Hashes for sequali-0.4.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd4287d474616d8105ba30684d7ca62f5971b22ca56ef82b7ae8dbc1c71ea3a4 |
|
MD5 | eea72133aee4f365d300a68a31854842 |
|
BLAKE2b-256 | fa9d8d9834add551fe845bd3531b75f3bdea8a30f59787234f1a7162a1aaac54 |
Hashes for sequali-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67cba78b49fa86eb4df66aef18bfc02a44593e0260e21b6128f6a771c28f91ce |
|
MD5 | 2b9080c6292c3f612a69fd92c337e5b8 |
|
BLAKE2b-256 | d459bd0b713b4fd1ded5fc1b435da180525779022474b9651f6f7755f5ceeba9 |
Hashes for sequali-0.4.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 503733a0f7ebc795f94c4ec93d40f18586210ace2d9b978487e02e1f4ff62f14 |
|
MD5 | 3be5bcc5399e6ac2871f14c9885c231a |
|
BLAKE2b-256 | b6bc3b994eda20fc73b526cd173f26da6fd781073e997a02cefcef9a4fcd3628 |
Hashes for sequali-0.4.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4ddd14ab891d503f8f89c5a3d50b7dc49b6a328845eb7f976c488940d673516 |
|
MD5 | 71d9d2314c83f801ce7d8cc2b09818c1 |
|
BLAKE2b-256 | 2f8b3c86b0a4fd65226e598861edfb8e3cf5fe5368678775f8f38509bb8844b8 |
Hashes for sequali-0.4.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5dc67d8903ac8e88aa26f0b59a99482e300d383c600fd278ec67a2d41e1fd61 |
|
MD5 | d6c3e3037523d1fdb78e85c36f9f1486 |
|
BLAKE2b-256 | 2905a65fb8983cfa44894a69a8b411758248bc6aaea619defd58d4f78f97bce3 |
Hashes for sequali-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66a126aeef0223e4ed9c85deea14ff31a4a2db5499138287b53e997f76d46072 |
|
MD5 | b32628949850f857402b3ccdda7f1eef |
|
BLAKE2b-256 | d7782f62112e615be7640df990f753439e05a2ddd9d1e174ab0a065887d690af |
Hashes for sequali-0.4.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9db9c98ff7eba28e2c13a999bc8ba6f60f3cca819dbc20347f8a991ecce672e |
|
MD5 | 1d5099908f75ccfc4b8d1cba793a5a7a |
|
BLAKE2b-256 | 8d9139737986f390ad7832d9cccc3ee5c1cc7439d54a9541e1ba16ac8f46c934 |
Hashes for sequali-0.4.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75dc0096bdde64e8ec3d7e3610497218c2168c8510c803613f1fc4cd49c80722 |
|
MD5 | d51cc633c3313f5c0ce63952d9ffbb4a |
|
BLAKE2b-256 | 4244f5d8d819e144a6c0ea7a6bc67fdc057eaab62e801c85380dd37ec998e755 |
Hashes for sequali-0.4.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85c6842f43f4837fb9218d1403289979d0c7cc3952babbe2eefab06454a120a2 |
|
MD5 | cb3e749e4729f4d22389fc253381db57 |
|
BLAKE2b-256 | d3e2eb24794824c13db92d1fd63ba420836b45de7f8731e12b152173f2cc978f |
Hashes for sequali-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbc3f88d784dc4ae415b08d101737e0bf6182356f681f098c004c90e3484d119 |
|
MD5 | 72ab531e9510aef39e16ad3ed0d63f48 |
|
BLAKE2b-256 | c6ab3a5d02697f7fac83ddbf501a8ec2c808dc1cfae9b6c1a62ff0562857b738 |
Hashes for sequali-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d40c71c944b43cb4b615ce9d51441bc3d59cf6ea71cbe8869ec744de64c466e |
|
MD5 | 32d48c83e531effa3b3c27165b977f11 |
|
BLAKE2b-256 | 6b67f079f31a85d7a35606de2ddbfc166cf604c4b309e0484946b675dc26ee5f |
Hashes for sequali-0.4.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ceca7837c3d158586af6b364c791da6894635b6d3bd6ac090d7b0d1465bf3e02 |
|
MD5 | dc4a35d039e428b40c156a307e6c2535 |
|
BLAKE2b-256 | e98d9241398b33c99d6b84f23344420389da4952f2bda82017d4e979d87f2e2d |
Hashes for sequali-0.4.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ef77083734911c12353484b8d55bf638f0550e0d789ad5723bbc787399acf2c |
|
MD5 | 3a215ae2b2ab89f49d1a682626c8ecf2 |
|
BLAKE2b-256 | 6975903063068d744360e04e50a79093d7ecb15aae11e66dae84e3d704909642 |
Hashes for sequali-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d904efffdbc2f089fda1273a71b8de7e8d219e5d44c723660465da75c0659ec9 |
|
MD5 | e53b0a766abada6e74a14b289ad51e0a |
|
BLAKE2b-256 | ec6a7d41e568fd4ab80c2d47ce9b3ba22702d8578a43e60e2b94085c40ef5998 |
Hashes for sequali-0.4.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac80c616248e3e5db29fa2b7696b343322a057fca8eea565a0abf20ab5034aa6 |
|
MD5 | eb2d88b832b8a4c395e9a69472a41070 |
|
BLAKE2b-256 | 0b082f81d36d3b783ea16a6021fa5aefd992ad7d61541dc8c88576fd6a8e3a1d |
Hashes for sequali-0.4.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a745416602342d6877b6e79293270638060d6b130734ed699a844a009f39ea5 |
|
MD5 | 57d0e5b1340f0669d948636e5c2a0234 |
|
BLAKE2b-256 | 23c0195c1dfce564bf8230f3163701578e199700762c2c0cbd7c76c004af119c |
Hashes for sequali-0.4.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2356e4b50cac26d89cf5d56a0eedfc71fcbd48ba473e3838e4a33695c90e1150 |
|
MD5 | a571ad137c0c53594b046a88ea2e97c5 |
|
BLAKE2b-256 | 7edbfe72691289f73d3546e6d0a093cac22fa4c661a40c1df8ce1e6e6c907c73 |
Hashes for sequali-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8029757263858b2fdf8d3212d19f6844b3d704b03a38ba9f9bc9e4aa098c2bc |
|
MD5 | 75100e5880314011a987589e45817e6b |
|
BLAKE2b-256 | 4a6cc64c5f0adee39ed88ab76b0e63dda8c378992071df7bbc716346acd1c8e1 |
Hashes for sequali-0.4.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7109e709105f8011edc9531cac891a1f235dd0aafdf63aa34ef053d3a9cba981 |
|
MD5 | 9c945961a03bc7f2b4e5daeddb75ef7f |
|
BLAKE2b-256 | 193337e1041c645e89213fdc5704a57e11a5b9f29f7543b643f4e2addfcb66c3 |