Skip to main content

Filter a Illumina FASTQ file based on index sequence

Project description

filter_illumina_index

Filter a Illumina FASTQ file based on index sequence.

Reads a Illumina FASTQ file and compares the sequence index in the sample number position of the sequence identifier to a supplied sequence index. Entries that match the sequence index are filtered into the 'filtered' file (if any) and entries that don't match are filtered into the 'unfiltered' file (if any). Displays the count of total, filtered and unfiltered reads. Matching with mismatches, and input and output gzip compression are supported.

For information on Illumina sequence identifiers in FASTQ files, see: http://support.illumina.com/content/dam/illumina-support/help/BaseSpaceHelp_v2/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_FASTQFiles.htm

Usage details

usage: filter_illumina_index [-h] [--version] [-f FILTERED] [-u UNFILTERED] -i
                             INDEX [-m MISMATCHES] [-c] [-v]
                             inputfile

positional arguments:
  inputfile             Input FASTQ file, compression supported

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -f FILTERED, --filtered FILTERED
                        Output FASTQ file containing filtered (positive) reads
                        (default: None)
  -u UNFILTERED, --unfiltered UNFILTERED
                        Output FASTQ file containing unfiltered (negative)
                        reads (default: None)
  -m MISMATCHES, --mismatches MISMATCHES
                        Maximum number of mismatches to accept (default: 0)
  -c, --compressed      Compress output files (note: file extension not
                        modified) (default: False)
  -v, --verbose         Show verbose output (default: False)

required named arguments:
  -i INDEX, --index INDEX
                        Sequence index to filter for (default: None)

Example usage

filter_illumina_index srv\example_reads.fastq --index GATCGTGT --filtered var\filtered_reads.fastq --unfiltered var\unfiltered_reads.fastq

This will process srv\example_reads.fastq, matching to index GATCGTGT with no mismatches allowed (default). Reads matching this index will be saved to var\filtered_reads.fastq and those not matching this index will be saved to var\unfiltered_reads.fastq. In addition, the following output will be displayed:

Total reads: 30
Filtered reads: 29
Unfiltered reads: 1

Additional details

  • Author: Tet Woo Lee
  • Copyright: © 2018 Tet Woo Lee
  • Licence: GPLv3
  • Dependencies: Biopython, tested on v1.72

Change log

version 1.0.dev1 2018-12-13 : First working version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filter_illumina_index-1.0.dev1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filter_illumina_index-1.0.dev1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file filter_illumina_index-1.0.dev1.tar.gz.

File metadata

  • Download URL: filter_illumina_index-1.0.dev1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for filter_illumina_index-1.0.dev1.tar.gz
Algorithm Hash digest
SHA256 690064b0e143e3359a1d0e5ac025b37c323737615dc4849d6aec6d0591113607
MD5 14d104762a858e939c434b92cf1da441
BLAKE2b-256 e7f47772447b77f22f5e0f9c685e6595fd37eea3038dd9346a96db9b1ceb1a8d

See more details on using hashes here.

File details

Details for the file filter_illumina_index-1.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: filter_illumina_index-1.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for filter_illumina_index-1.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 a30e0f429901729f13390644803d9bf6d98f44f5090a47c3d2a9f47362d84456
MD5 d7468bd1f6f999377e922d108144cf85
BLAKE2b-256 5841ef1f00c47d900decd1784bed418c96716854af0760668e210adf51adf87c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page