Skip to main content

A fast FASTQ filter progam.

Project description

A fast FASTQ filter program.

Fastq-filter correctly takes into account that quality scores are log scores when calculating the mean.

Installation

For the latest development version

pip install git+https://github.com/LUMC/fastq-filter

Quickstart

fastq-filter mean_quality:20 my.fastq

This will filter out all fastq files that have a mean quality below 20.

Other filters are median_quality, min_length and max_length. For more information use: fastq-filter --help-filters or see the filters chapter below.

Fastq-filter can also chain filters together:

fastq-filter 'min_length:100|mean_quality:20' my.fastq

It is advisible to put the fastest filters (length) before the slower ones (quality) to optimize performance.

Usage

usage: fastq-filter [-h] [--help-filters] [-o OUTPUT] filters input

positional arguments:
  filters               Filters and arguments. For example: mean_quality:20,
                        for filtering all reads with an average quality below
                        20. Multiple filters can be applied by separating with
                        the | symbol. For example:
                        min_length:100|mean_quality:20. Make sure to use
                        faster filters (length) before slower ones (quality)
                        for optimal performance. Use --help-filters to print
                        all the available filters.
  input                 Input FASTQ file. Compression format automatically
                        detected.

optional arguments:
  -h, --help            show this help message and exit
  --help-filters        Print all the available filters.
  -o OUTPUT, --output OUTPUT
                        Output FASTQ file. Compression format automatically
                        determined by file extension. Default: stdout.

Filters

mean_quality:<quality>

The mean quality of the FASTQ record is equal or above the given quality value.

median_quality:<quality>

The median quality of the FASTQ record is equal or above the given quality value.

min_length:<length>

The length of the sequence in the FASTQ record is at least min_length

max_length:<length>

The length of the sequence in the FASTQ record is at most max_length

Optimizations

fastq-filter has used the following optimizations to be fast:

  • Filters can be chained together to minimize IO.

  • The python filter function is used. Which is a a shorthand for python code that would otherwise need to be interpreted.

  • The mean and median quality algorithms are implemented in Cython.

  • The mean quality algorithm uses a lookup table since there are only 93 possible phred scores encoded in FASTQ. That saves a lot of power calculations to calculate the probabilities.

  • The median quality algorithm implements a counting sort, which is really fast but not applicable for generic data. Since FASTQ qualities are uniquely suited for a counting sort, median calculation can be performed very quickly.

  • dnaio is used as FASTQ parser. This parses the FASTQ files with a parser written in Cython.

  • xopen is used to read and write files. This allows for support of gzip compressed files which are opened using python-isal which reads gzip files 2 times faster and writes gzip files 5 times faster than the python gzip module implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastq-filter-0.1.0.tar.gz (53.7 kB view hashes)

Uploaded Source

Built Distributions

fastq_filter-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (159.4 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (150.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp310-cp310-win_amd64.whl (34.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

fastq_filter-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl (140.2 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

fastq_filter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.0 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl (35.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

fastq_filter-0.1.0-cp39-cp39-win_amd64.whl (34.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

fastq_filter-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl (139.8 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

fastq_filter-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135.6 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl (35.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

fastq_filter-0.1.0-cp38-cp38-win_amd64.whl (34.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

fastq_filter-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl (142.1 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

fastq_filter-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (138.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp38-cp38-macosx_10_9_x86_64.whl (35.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

fastq_filter-0.1.0-cp37-cp37m-win_amd64.whl (33.8 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

fastq_filter-0.1.0-cp37-cp37m-musllinux_1_1_x86_64.whl (128.3 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

fastq_filter-0.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (128.7 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp37-cp37m-macosx_10_9_x86_64.whl (34.9 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

fastq_filter-0.1.0-cp36-cp36m-win_amd64.whl (36.7 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

fastq_filter-0.1.0-cp36-cp36m-musllinux_1_1_x86_64.whl (130.2 kB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

fastq_filter-0.1.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (127.3 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

fastq_filter-0.1.0-cp36-cp36m-macosx_10_9_x86_64.whl (34.8 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page