Skip to main content

Filter a Illumina FASTQ file based on index sequence

Project description

# filter_illumina_index
## Filter a Illumina FASTQ file based on index sequence.

Reads a Illumina FASTQ file and compares the sequence index in the
`sample number` position of the sequence identifier to a supplied sequence
index. Entries that match the sequence index are filtered into the *filtered
file* (if any) and entries that don't match are filtered into the *unfiltered
file* (if any). Displays the count of total, filtered and unfiltered reads.
Matching with mismatches (`-m` parameter), and gzip compression for input
(detected on the basis of file extension) and output (specified using `-c`
parameter) are supported.

For information on Illumina sequence identifiers in FASTQ files, see: http://support.illumina.com/content/dam/illumina-support/help/BaseSpaceHelp_v2/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_FASTQFiles.htm

### Usage details

```
usage: filter_illumina_index [-h] [--version] [-f FILTERED] [-u UNFILTERED] -i
INDEX [-m MISMATCHES] [-c] [-v]
inputfile

positional arguments:
inputfile Input FASTQ file, compression supported

optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-f FILTERED, --filtered FILTERED
Output FASTQ file containing filtered (positive) reads
(default: None)
-u UNFILTERED, --unfiltered UNFILTERED
Output FASTQ file containing unfiltered (negative)
reads (default: None)
-m MISMATCHES, --mismatches MISMATCHES
Maximum number of mismatches to accept (default: 0)
-c, --compressed Compress output files (note: file extension not
modified) (default: False)
-v, --verbose Show verbose output (default: False)

required named arguments:
-i INDEX, --index INDEX
Sequence index to filter for (default: None)
```

### Example usage

The directory `srv` contains example reads in FASTQ and compressed FASTQ format with index `GATCGTGT` and one read with a mismatch.

To test, run:

`filter_illumina_index srv/example_reads.fastq --index GATCGTGT --filtered var/filtered_reads.fastq --unfiltered var/unfiltered_reads.fastq`

This will process `srv/example_reads.fastq`, matching to index `GATCGTGT` with no mismatches allowed (default). Reads matching this index will be saved to `var/filtered_reads.fastq` and those not matching this index will be saved to `var/unfiltered_reads.fastq`. In addition, the following output will be displayed:

```
Total reads: 30
Filtered reads: 29
Unfiltered reads: 1
```

---

### Additional details

* Author: Tet Woo Lee
* Copyright: © 2018 Tet Woo Lee
* Licence: GPLv3
* Dependencies: Biopython, tested on v1.72

### Change log

version 1.0.1 2018-12-19
: Speed up number of mismatches calculation

version 1.0 2018-12-14
: Minor updates for PyPi and conda packaging

version 1.0.dev1 2018-12-13
: First working version


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filter_illumina_index-1.0.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filter_illumina_index-1.0.1-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file filter_illumina_index-1.0.1.tar.gz.

File metadata

  • Download URL: filter_illumina_index-1.0.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.11.2 CPython/3.6.7

File hashes

Hashes for filter_illumina_index-1.0.1.tar.gz
Algorithm Hash digest
SHA256 398ff4ea90939abcca184de98be480830b578d3dedca94fae144a0072be6dfdd
MD5 5d7396b0274630773b416f41ada42376
BLAKE2b-256 ef7576a396c9a829f118830afa4d5e8cc33c0baa420295f735052e790923915b

See more details on using hashes here.

File details

Details for the file filter_illumina_index-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: filter_illumina_index-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.11.2 CPython/3.6.7

File hashes

Hashes for filter_illumina_index-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4acc84f211874b06131c8d5881e6aa086e8594921e30c97a966050ad4851d819
MD5 b9c1e98470a9c107fc192686c278f365
BLAKE2b-256 5dd7a9a1407f023ce2931bbb6a0290c3ee10ef0d7f75ce5ab1670d737aa9936c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page