A fast FASTQ filter progam.
Project description
A fast FASTQ filter program.
Fastq-filter correctly takes into account that quality scores are log scores when calculating the mean.
Installation
For the latest development version
pip install git+https://github.com/LUMC/fastq-filter
Quickstart
fastq-filter mean_quality:20 my.fastq
This will filter out all fastq files that have a mean quality below 20.
Other filters are median_quality, min_length and max_length. For more information use: fastq-filter --help-filters or see the filters chapter below.
Fastq-filter can also chain filters together:
fastq-filter 'min_length:100|mean_quality:20' my.fastq
It is advisible to put the fastest filters (length) before the slower ones (quality) to optimize performance.
Usage
usage: fastq-filter [-h] [--help-filters] [-o OUTPUT] filters input
positional arguments:
filters Filters and arguments. For example: mean_quality:20,
for filtering all reads with an average quality below
20. Multiple filters can be applied by separating with
the | symbol. For example:
min_length:100|mean_quality:20. Make sure to use
faster filters (length) before slower ones (quality)
for optimal performance. Use --help-filters to print
all the available filters.
input Input FASTQ file. Compression format automatically
detected.
optional arguments:
-h, --help show this help message and exit
--help-filters Print all the available filters.
-o OUTPUT, --output OUTPUT
Output FASTQ file. Compression format automatically
determined by file extension. Default: stdout.
Filters
mean_quality:<quality> |
The mean quality of the FASTQ record is equal or above the given quality value. |
median_quality:<quality> |
The median quality of the FASTQ record is equal or above the given quality value. |
min_length:<length> |
The length of the sequence in the FASTQ record is at least min_length |
max_length:<length> |
The length of the sequence in the FASTQ record is at most max_length |
Optimizations
fastq-filter has used the following optimizations to be fast:
Filters can be chained together to minimize IO.
The python filter function is used. Which is a a shorthand for python code that would otherwise need to be interpreted.
The mean and median quality algorithms are implemented in Cython.
The mean quality algorithm uses a lookup table since there are only 93 possible phred scores encoded in FASTQ. That saves a lot of power calculations to calculate the probabilities.
The median quality algorithm implements a counting sort, which is really fast but not applicable for generic data. Since FASTQ qualities are uniquely suited for a counting sort, median calculation can be performed very quickly.
dnaio is used as FASTQ parser. This parses the FASTQ files with a parser written in Cython.
xopen is used to read and write files. This allows for support of gzip compressed files which are opened using python-isal which reads gzip files 2 times faster and writes gzip files 5 times faster than the python gzip module implementation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for fastq_filter-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e01d419bc3b4ff24ffcb17b327cd627d386e33bdc18d0bc9478ba4fae3878c9 |
|
MD5 | 293519dd521f23841055bf25529e4f58 |
|
BLAKE2b-256 | 7e010b5e66af43b6aeef695e8bf05298d9f21905960bbfd34c7dd6736ecc1f97 |
Hashes for fastq_filter-0.1.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ef713730d5ba5fe68bc8c48dc63c570164fc1a01dd42401abc4aa7050ad225a |
|
MD5 | 524983095f1d4058c9db86ed11dd9faf |
|
BLAKE2b-256 | e708f2064057ba89f6a4afc32dfe0a6f857b9a0317dc1a2e4c2a270ccce92822 |
Hashes for fastq_filter-0.1.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55c1ee29093ea2e17c56e67623f4ba33f84e7a5bc1fa4c455e47c063cea01183 |
|
MD5 | edf99bb17d4a51de1f8b6ebf3812586d |
|
BLAKE2b-256 | a63c78b88f2fad5fbd89a3b422de9338ddb3b6b91b9635195a98640bb810f297 |
Hashes for fastq_filter-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d929b967e2faf742628a8202a64cb38376207f925e251ab6dff77c7e3d7daab0 |
|
MD5 | dc8f93201485a6ec287e04e5d6aecbd8 |
|
BLAKE2b-256 | 6515186d6800d88034866e6709ad1bac581909a093acf320256ee2a7f613abad |
Hashes for fastq_filter-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3183a4eddb247b2d9032a8f040fd63f767c8d008db67767ba68b86590ad2c6f4 |
|
MD5 | 4e91da1860315bd319bc2a72d4aecffd |
|
BLAKE2b-256 | 08e67e23258718e4ab3ae43e04487e37e9d74b313a14d7c75405185db289f959 |
Hashes for fastq_filter-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46d6bae6c558bad9f66a00d73b4f68ff9450a04c73cdcd4571360b36fe5dbc65 |
|
MD5 | cf787436fc1d7b7408d52f288d0493ca |
|
BLAKE2b-256 | 61b8be742ab3c78f18eda0eaabeef897da25bbe428bc3c13dde2af27e0afe611 |
Hashes for fastq_filter-0.1.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a5aee5d38624dd63adc6658fc2e579f290da43b61a3c0752a668c58f6f4b511 |
|
MD5 | d2c1f9a9ec79af819f9def29d08b6716 |
|
BLAKE2b-256 | 4371f677a3a77d997c41fda5c6fc9407ade98e21cd2da981cb7dc8a22309221d |
Hashes for fastq_filter-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65a9bd07adc8dc26274f4da2c82450299c8a2faed0dcb465ff1fd95fdba7cb26 |
|
MD5 | 764f89325f9deb99876282227180cab6 |
|
BLAKE2b-256 | cbc497260980930bcfd15ed0ad2277bb7644ed829ac097072ae992861cb1670e |
Hashes for fastq_filter-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94f4328742beafd743241d9a7a52c9bdecca565379fa6f7de7c9f0e79f5cc7e0 |
|
MD5 | 8588937a92941611b0a6befdccc3811b |
|
BLAKE2b-256 | 0aa8b3f415ec30ad7e7454b287b1ddb174836d70818c7ef4ab8ca9cd42f4ec45 |
Hashes for fastq_filter-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 643d7828b8a485d92d46e2ed97f15230101a86e3a12c7898b4bc5606e450394a |
|
MD5 | 8d5c8ddd289c78918fdcffa2b9020b0f |
|
BLAKE2b-256 | b5a32adfeb88295cb3528320ab399d234035601d23f07cf152d20fde84887d03 |
Hashes for fastq_filter-0.1.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bb100c1b14a046f32a75b469046000b9444cff72a0a620626503566fcc2bc63 |
|
MD5 | d096f5951158a7b550b086f6a1d001e7 |
|
BLAKE2b-256 | 8ae23e0f90df56ee10e216a705f609a05c2f1e94b044b0ec36bf1f557541190b |
Hashes for fastq_filter-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd5beeec53534e30c8b488689e8a6a5ea383d203cf4b8d100f1ec0cc1e9f0077 |
|
MD5 | 28209dcf9cc7273604ec77bf7992e2a2 |
|
BLAKE2b-256 | 0d20b104f2446c30004971f16ad8e435d084747991cc19acdbae562adcb33766 |
Hashes for fastq_filter-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19cd387c9ac1cb8f784d1c3ffd5aed5c53540b76c9f8ddcc6736f5858e6caf6f |
|
MD5 | e596b3d3db18b5c289a774271f7545eb |
|
BLAKE2b-256 | c80323204beeec68517d4347ff796e74e57cffed313f69df830b1c68b07b02e6 |
Hashes for fastq_filter-0.1.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ffc72e82c979a97a7d3c212880be4fb05eb2d5055d635af15dc5418387c8c64 |
|
MD5 | f71833e67e2de8cd01cc6c4306a2c04e |
|
BLAKE2b-256 | d7c453f0125b4657393af84670edba807336f0c0c9a4056614cb19c79d271892 |
Hashes for fastq_filter-0.1.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaffbe42ccb14f964060a2bb8250f036855b16cf8d39f672c469bd9fb41066ff |
|
MD5 | f2921591a10f3d55b5a5588fa40b3ebe |
|
BLAKE2b-256 | 847efbfb747b40c8da2918ea9b58ba8d64a2cfc856ce9d53a411bd60c1733fc8 |
Hashes for fastq_filter-0.1.0-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fb4fddc8ef5daa88543647cd5ec8f9ffbf3c50f22f23bebeb42896b59e5179f |
|
MD5 | a010f65cc9480891750a9fea78f69595 |
|
BLAKE2b-256 | 38a3b52d554dddf98513bb651e8c6d094534e555ecc8645355703fbf9f204b0c |
Hashes for fastq_filter-0.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eb27a469d07796f1aebc5cf56864b32a01e4685bfcf872aa1cc68fb393cba00 |
|
MD5 | 6d00364314a85ad129019d08561a2300 |
|
BLAKE2b-256 | 4281077bfe191fcaf84c8df348a57d11fe7af9ff111d3f6543b853a6531f2bbb |
Hashes for fastq_filter-0.1.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c0b4258e4df90133e4383834209e48c55299166ea623ba29890e0ff3e427f5b |
|
MD5 | b7f960f01e394937529623039faaf098 |
|
BLAKE2b-256 | 41976e06f9efb11fddb170bfa2385a6a3ad9f9a32116f744f448a13808a946f4 |
Hashes for fastq_filter-0.1.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09283a38dd0c44aeda3ba85cfdc8a7b97d8e6088550e2db2505125da1e3ed632 |
|
MD5 | 1ebbe2c23005099d16499224855c47d0 |
|
BLAKE2b-256 | 9b1ebbaf8d05c3d03c23d20845daa8c08c2dcb3cf3cc0f4fd9b1eb8c50954720 |
Hashes for fastq_filter-0.1.0-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60248da24b6c12b3127b50bc1cf136bb92965c7596e58056cc9676a36ce27df4 |
|
MD5 | 4133f205d5fdc426307cacd935ac52b1 |
|
BLAKE2b-256 | 015cee84606c57501394b4b3949ab4301f980c988b21499225c16558a35774cd |
Hashes for fastq_filter-0.1.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb1b39061cd1d61adfb14793049be78abf902678316d16a5b161d04fa6d0ac41 |
|
MD5 | 54c145a26fbca329573897d30ad6e627 |
|
BLAKE2b-256 | adea6f0e318a72841900d6d179afa78cde42119629d0adbe36ced61968421ddc |
Hashes for fastq_filter-0.1.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b87963291a06bd8c97ce291f9d4a28ec1bcfae25bce895c18f513685562cc34d |
|
MD5 | d52cdc4aac81b16772ee677fc8290345 |
|
BLAKE2b-256 | 19305040805c3f1e26301807b63b03e2b28c18256f44c413787b33c30ed3d9e2 |