Skip to main content

Sequali is a QC tool that generates useful graphs for both short and long-read data.

Project description

Sequali

Sequence quality metrics for FASTQ and uBAM files.

Features:

  • MultiQC support since MultiQC version 1.22.

  • Low memory footprint, small install size and fast execution times.

    • Sequali typically needs less than 2 GB of memory and 3-30 minutes runtime when run on 2 cores (the default).

  • Informative graphs that allow for judging the quality of a sequence at a quick glance.

  • Overrepresentation analysis using 21 bp sequence fragments. Overrepresented sequences are checked against the NCBI univec database.

  • Estimate duplication rate using a fingerprint subsampling technique which is also used in filesystem duplication estimation.

  • Checks for 6 illumina adapter sequences and 17 nanopore adapter sequences for single read data.

  • Determines adapters by overlap analysis for paired read data.

  • Insert size metrics for paired read data.

  • Per tile quality plots for illumina reads.

  • Channel and other plots for nanopore reads.

  • FASTQ and unaligned BAM are supported. See “Supported formats”.

  • Reproducible reports without timestamps.

Example reports:

  • GM24385_1.fastq.gz; HG002 (Genome In A Bottle) on ultra-long Nanopore Sequencing. ENA accession: ERR3988483.

  • GM24385_1_cut.fastq.gz; GM24385_1.fastq.gz processed with cutadapt: cutadapt -o GM24385_1_cut.fastq.gz --cut -64 --cut 64 --minimum-length 500 -Z --max-aer 0.1 GM24385_1.fastq.gz. The resulting file has 64 bp cut off from both its ends and after that filtered for a minimum length of 500 and a maximum average error rate of 0.1.

  • 21C125_R1.fastq.gz; Illumina NovaSeq X paired-end sequencing of Campylobacter jejuni. ENA accession: ERR11204024.

For more information check the documentation.

Supported formats

  • FASTQ. Only the Sanger variation with a phred offset of 33 and the error rate calculation of 10 ^ (-phred/10) is supported. All sequencers use this format today.

    • Paired end sequencing data is supported.

    • For sequences called by illumina base callers an additional plot with the per tile quality will be provided.

    • For sequences called by guppy additional plots for nanopore specific data will be provided.

  • (unaligned) BAM with single reads. Read-pair information is currently ignored.

    • For BAM data as delivered by dorado additional nanopore plots will be provided.

    • For aligned BAM files, secondary and supplementary reads are ignored similar to how samtools fastq handles the data.

Installation

Installation via pip is available with:

pip install sequali

Sequali is also distributed via bioconda. It can be installed with:

conda install -c conda-forge -c bioconda sequali

Quickstart

sequali path/to/my.fastq.gz

This will create a report my.fastq.gz.html and a json my.fastq.gz.json in the current working directory.

To set the directory where the reports are created the --outdir flag can be used. This is useful when using [MultiQC](https://github.com/multiqc/multiqc).

sequali --out-dir /my/dir/all_sequali_reports my.fastq.gz

The html and json filenames can be set separately.

sequali --html before_qc.html --json before_qc.json my.fastq.gz
sequali --html after_qc.html --json after_qc.json my.cutadapt.fastq.gz

Sequali can handle paired-end data.

sequali /sequencing_data/sample100_R1.fastq.gz /sequencing_data/sample100_R2.fastq.gz

Additionally sequali can handle BAM data. Proper pair handling is not yet supported for BAM data, so this is primarily useful for ONT datasets.

sequali /sequencing_data/sample100_dorado_called_hac_v4.30.bam

Sequali by default uses one thread per compressed input file and one thread for the read processing, typically keeping two cores busy. Sequali can also use a single core, which is slower, but typically more efficient for HPC scenarios where multiple files can be run simultaneously. (Below a SLURM example.)

sbatch -c 1 --time 59 --partition short \
--wrap 'sequali --threads 1 /cluster-scratch/myusername/my.fastq.gz'

Using a thread count higher than 2 has no effect. Due to the decompression bottleneck, bringing the full power of multithreading to Sequali has limited utility whilst having a disproportionally high cost in additional code complexity.

For all command line options checkout the usage documentation.

For more extensive information about the module options check the documentation on the module options.

Acknowledgements

  • FastQC for its excellent selection of relevant metrics. For this reason these metrics are also gathered by Sequali.

  • The matplotlib team for their excellent work on colormaps. Their work was an inspiration for how to present the data and their RdBu colormap is used to represent quality score data. Check their writings on colormaps for a good introduction.

  • Wouter de Coster for his excellent post on how to correctly average phred scores as well as the idea for using end-anchored plots from NanoQC.

  • Marcel Martin for providing very extensive feedback.

  • Agnès Barnabé for creating a Galaxy wrapper.

Citation

If you wish to credit Sequali please cite the Sequali article.

License

This project is licensed under the GNU Affero General Public License v3. Mainly to avoid commercial parties from using it without notifying the users that they can run it themselves. If you want to include code from Sequali in your open source project, but it is not compatible with the AGPL, please contact me and we can discuss a separate license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequali-1.0.2.tar.gz (3.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sequali-1.0.2-cp310-abi3-win_amd64.whl (565.9 kB view details)

Uploaded CPython 3.10+Windows x86-64

sequali-1.0.2-cp310-abi3-musllinux_1_2_x86_64.whl (577.6 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

sequali-1.0.2-cp310-abi3-musllinux_1_2_aarch64.whl (568.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

sequali-1.0.2-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

sequali-1.0.2-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (568.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

sequali-1.0.2-cp310-abi3-macosx_11_0_arm64.whl (558.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

sequali-1.0.2-cp310-abi3-macosx_10_9_x86_64.whl (564.1 kB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file sequali-1.0.2.tar.gz.

File metadata

  • Download URL: sequali-1.0.2.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for sequali-1.0.2.tar.gz
Algorithm Hash digest
SHA256 bfd45fbf4faaf447734edbce5d69368fbf19008d680b868e7337619022b55f54
MD5 a98b615daa5f4f83b2a3c4bf73933189
BLAKE2b-256 0efc19f13be5a8a97e3d560dbce96e414a3e6d722b9d359c08172475ddab2a39

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sequali-1.0.2-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 565.9 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for sequali-1.0.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1c5bfb544b6df57206563c9aea4eb49acc68ca4b01826ef5496db96dd91502b1
MD5 243f63b8a33e71d25469648e5875a9c0
BLAKE2b-256 f6cb4b24ab35bc56c8129a2254e8d7aa4e0b6e45c9cb4d9709ba249ccb109899

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 00445b67705e880ceb9160d7f3bac8fb6cbb5d67267f826f8353e822a76bab67
MD5 bb54d4de57ca40f8b745be6b2a24f79a
BLAKE2b-256 0d24d563054db48372faa98ae3e85987829b800339f376d3464c12ce37d8dd9a

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 5e08ccf4cb5fd68207576da437507589581afab61a8e5ede93c4d623e7c5071d
MD5 2b1921e46753680e6ff4ec4d072133b7
BLAKE2b-256 276da7e079ea0b889a37119f201b20a746825e3ae9a49eace4159071caed5957

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aaef609e8f1cb9896afeafea136d55140e9b3bf3dacd2333209bed44efb61a3a
MD5 3ac71f8aab9d1b1efe809d3f17aed2cb
BLAKE2b-256 82f6ccab6e1167acb3d86e9cd69b1e98ee47c7ac91d44e9a16b5771c5691570c

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3edf108418d49d9c25dca383b2eaf9f87b231f97096b468bb61615d09b74352f
MD5 d0bc983c73e3817ab93f14e18824a0da
BLAKE2b-256 f142e1d38ccf3eca3f7b583f1964c826235d0e6df123ae6538429f2c27f62ef6

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8414e60a0fe22e19b21f0c93f11075fb8c6995bebad6507e9a06245d1256f555
MD5 9fdae6961b5400557be67c4a5dba38e4
BLAKE2b-256 b57d257e18aabd52e234d448a9334da1b3e484f6cb9ed6700d6c230d6ed963ad

See more details on using hashes here.

File details

Details for the file sequali-1.0.2-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sequali-1.0.2-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ebf56d4510532fe7801a60f62f44b48f102309d4ae82ddb8fe4362f576a0a68e
MD5 a946835432a609b6cb20f9d51db57e9d
BLAKE2b-256 eb1faf20735a73bb64ea420ae93f0d227ea9a07f5135667ac30fff0e50e25698

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page