Filter a Illumina FASTQ file based on index sequence

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

filter_illumina_index

Filter a Illumina FASTQ file based on index sequence.

Reads a Illumina FASTQ file and compares the sequence index in the sample number position of the sequence identifier to a supplied sequence index. Entries that match the sequence index are filtered into the filtered file (if any) and entries that don't match are filtered into the unfiltered file (if any). Displays the count of total, filtered and unfiltered reads. Matching with mismatches (-m parameter), and gzip compression for input (detected on the basis of file extension) and output (specified using -c parameter) are supported.

For information on Illumina sequence identifiers in FASTQ files, see: http://support.illumina.com/content/dam/illumina-support/help/BaseSpaceHelp_v2/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_FASTQFiles.htm

Usage details

usage: filter_illumina_index [-h] [--version] [-f FILTERED] [-u UNFILTERED] -i
                             INDEX [-m MISMATCHES] [-c] [-v]
                             inputfile

positional arguments:
  inputfile             Input FASTQ file, compression supported

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -f FILTERED, --filtered FILTERED
                        Output FASTQ file containing filtered (positive) reads
                        (default: None)
  -u UNFILTERED, --unfiltered UNFILTERED
                        Output FASTQ file containing unfiltered (negative)
                        reads (default: None)
  -m MISMATCHES, --mismatches MISMATCHES
                        Maximum number of mismatches to accept (default: 0)
  -c, --compressed      Compress output files (note: file extension not
                        modified) (default: False)
  -v, --verbose         Show verbose output (default: False)

required named arguments:
  -i INDEX, --index INDEX
                        Sequence index to filter for (default: None)

Example usage

The directory srv contains example reads in FASTQ and compressed FASTQ format with index GATCGTGT and one read with a mismatch.

To test, run:

filter_illumina_index srv\example_reads.fastq --index GATCGTGT --filtered var\filtered_reads.fastq --unfiltered var\unfiltered_reads.fastq

This will process srv\example_reads.fastq, matching to index GATCGTGT with no mismatches allowed (default). Reads matching this index will be saved to var\filtered_reads.fastq and those not matching this index will be saved to var\unfiltered_reads.fastq. In addition, the following output will be displayed:

Total reads: 30
Filtered reads: 29
Unfiltered reads: 1

Additional details

Author: Tet Woo Lee
Copyright: Â© 2018 Tet Woo Lee
Licence: GPLv3
Dependencies: Biopython, tested on v1.72

Change log

version 1.0 2018-12-14 : Minor updates for PyPi and conda packaging

version 1.0.dev1 2018-12-13 : First working version

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.5

Dec 14, 2023

1.0.4

Apr 11, 2020

1.0.4.dev3 pre-release

Apr 11, 2020

1.0.4.dev2 pre-release

Apr 10, 2020

1.0.4.dev1 pre-release

Apr 10, 2020

1.0.3.post2

Apr 1, 2020

1.0.3.post1

Apr 1, 2020

1.0.3

Apr 1, 2020

1.0.2

Dec 19, 2018

1.0.1

Dec 19, 2018

This version

1.0

Dec 14, 2018

1.0.dev1 pre-release

Dec 13, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filter_illumina_index-1.0.tar.gz (16.4 kB view hashes)

Uploaded Dec 14, 2018 Source

Built Distribution

filter_illumina_index-1.0-py3-none-any.whl (30.2 kB view hashes)

Uploaded Dec 14, 2018 Python 3

Hashes for filter_illumina_index-1.0.tar.gz

Hashes for filter_illumina_index-1.0.tar.gz
Algorithm	Hash digest
SHA256	`2fa46b6f03bfd353eba170ea88a3f6901e85c8848b3392bae948cebe6ef94488`
MD5	`faf40a514d27a82bf84698330b55c2cd`
BLAKE2b-256	`385fd823e113b0ea6e3e4c726ede26affcd0cb9dab5551f11a544362d6f976bb`

Hashes for filter_illumina_index-1.0-py3-none-any.whl

Hashes for filter_illumina_index-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a3de2a3db4690cfc089187b709e5a0c0da64c8e955d0c3e955e79cf9ac5472d`
MD5	`0177242259feee89d8e159739b1fc677`
BLAKE2b-256	`51a838078b741f50a4af66eb8263c67ad0818d1050b9b92d63bd3511c827dd0a`