Skip to main content

Filter a Illumina FASTQ file based on index sequence

Project description

# filter_illumina_index
## Filter a Illumina FASTQ file based on index sequence.

Reads a Illumina FASTQ file and compares the sequence index in the
`sample number` position of the sequence identifier to a supplied sequence
index. Entries that match the sequence index are filtered into the *filtered
file* (if any) and entries that don't match are filtered into the *unfiltered
file* (if any). Displays the count of total, filtered and unfiltered reads,
as well as the number of mismatches found across all reads. Matching tolerating
a certain number of mismatches (`-m` parameter), and gzip compression for input
(detected on the basis of file extension) and output (specified using `-c`
parameter) are supported.

For information on Illumina sequence identifiers in FASTQ files, see: http://support.illumina.com/content/dam/illumina-support/help/BaseSpaceHelp_v2/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_FASTQFiles.htm

### Usage details

```
usage: filter_illumina_index [-h] [--version] [-f FILTERED] [-u UNFILTERED] -i
INDEX [-m MISMATCHES] [-c] [-v]
inputfile

positional arguments:
inputfile Input FASTQ file, compression supported

optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-f FILTERED, --filtered FILTERED
Output FASTQ file containing filtered (positive) reads
(default: None)
-u UNFILTERED, --unfiltered UNFILTERED
Output FASTQ file containing unfiltered (negative)
reads (default: None)
-m MISMATCHES, --mismatches MISMATCHES
Maximum number of mismatches to accept (default: 0)
-c, --compressed Compress output files (note: file extension not
modified) (default: False)
-v, --verbose Show verbose output (default: False)

required named arguments:
-i INDEX, --index INDEX
Sequence index to filter for (default: None)
```

### Example usage

The directory `srv` contains example reads in FASTQ and compressed FASTQ format with index `GATCGTGT` and one read with a mismatch.

To test, run:

`filter_illumina_index srv/example_reads.fastq --index GATCGTGT --filtered var/filtered_reads.fastq --unfiltered var/unfiltered_reads.fastq`

This will process `srv/example_reads.fastq`, matching to index `GATCGTGT` with no mismatches allowed (default). Reads matching this index will be saved to `var/filtered_reads.fastq` and those not matching this index will be saved to `var/unfiltered_reads.fastq`. In addition, the following output will be displayed:

```
Total reads: 30
Filtered reads: 29
Unfiltered reads: 1
Reads with 0 mismatches: 29
Reads with 1 mismatches: 1
Reads with 2 mismatches: 0
Reads with 3 mismatches: 0
Reads with 4 mismatches: 0
Reads with 5 mismatches: 0
Reads with 6 mismatches: 0
Reads with 7 mismatches: 0
Reads with 8 mismatches: 0
```

---

### Additional details

* Author: Tet Woo Lee
* Copyright: © 2018 Tet Woo Lee
* Licence: GPLv3
* Dependencies: Biopython, tested on v1.72

### Change log

version 1.0.2 2018-12-19
: Shows statistics on number of mismatches found

version 1.0.1 2018-12-19
: Speed up number of mismatches calculation

version 1.0 2018-12-14
: Minor updates for PyPi and conda packaging

version 1.0.dev1 2018-12-13
: First working version


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filter_illumina_index-1.0.2.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filter_illumina_index-1.0.2-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file filter_illumina_index-1.0.2.tar.gz.

File metadata

  • Download URL: filter_illumina_index-1.0.2.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.11.2 CPython/3.6.7

File hashes

Hashes for filter_illumina_index-1.0.2.tar.gz
Algorithm Hash digest
SHA256 70faeda8dacfe1161f79823af03f4f1337b4d8a302a49dba33227674defd2461
MD5 458c871507f0726f7d2616cbb6dd0e55
BLAKE2b-256 828434d7fc4b0ac58436225977a9b8a3078eb8037e277d322204800c84d2e172

See more details on using hashes here.

File details

Details for the file filter_illumina_index-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: filter_illumina_index-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.11.2 CPython/3.6.7

File hashes

Hashes for filter_illumina_index-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 737bdd78d031502278f2706b4a811af1a6b95145dd59717712d90376d49e9f0c
MD5 2dc6f65139d50cd5c6bda5798766bee3
BLAKE2b-256 937be1137bb0d2234f4fbbef7860576e625942c357aa52fd8f4c7068a491f169

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page