Skip to main content

FASTA/FASTQ demultiplexer.

Project description

# Demultiplex: demultiplex FASTA or a FASTQ files based on a list of barcodes

## Installation Via [pypi](https://pypi.python.org/pypi/demultiplex):

pip install demultiplex

From source:

git clone https://github.com/jfjlaros/demultiplex cd demultiplex pip install .

## Command line interface The demultiplex program provides several ways to demultiplex any number of FASTA or a FASTQ files based on a list of barcodes. This list can either be provided via a file or guessed from the data. The demultiplexer can be set to search for the barcodes in the header, or in the read itself. To allow for mismatches, two distance functions (edit distance and Hamming distance) are available.

### Illumina FASTQ files For Illumina FASTQ files, the barcodes can usually be found in the header of each FASTQ record. Currently, the demultiplex program supports two types of headers, the classical Illumina headers and the newer HiSeq X headers. These headers are detected automatically.

Demultiplexing is done with the demux subcommand by providing a list of barcodes. The barcodes file is formatted as follows:

name sequence

So a typical barcodes file might look like this:

index1 ACGTAA index2 GTAAGG

To use this to demultiplex two FASTQ files, where we assume that the barcode can be found in the header of the first file, we use the following command:

demultiplex demux barcodes.csv file_1.fq file_2.fq

This will generate six files:

file_1_index1.fq file_2_index1.fq file_1_index2.fq file_2_index2.fq file_1_UNKNOWN.fq file_2_UNKNOWN.fq

the first four files will contain records assigned to index1 and index2, the last two will contain anything that could not be assigned.

If the list of barcodes is not known beforehand, the guess subcommand can be used to search for a top list of barcodes. For example, if we want to search for the top five barcodes in the first 1000 records, we use the following:

demultiplex guess -o barcodes.csv -t 5 -n 1000 file.fq

This will generate the barcodes file that can be used for the demux subcommand.

If the number of barcodes is not known beforehand, an alternative selection method can be used which selects all barcodes with a minimum number of occurrences. The following command will generate a barcode file of all barcodes that occur at least five times in the first 1000 reads:

demultiplex guess -o barcodes.csv -f -t 5 -n 1000 file.fq

### Other files For platforms other than Illumina, or for alternative sequencing runs, like those coming from 10X experiments, barcodes sometimes end up at a specific location in each read. It can also be that the barcode is in the header, but only part of this barcode is used, this happens when dual indexing is used for example. To deal with these cases, both the guess as well as the demux subcommand can be instructed to look voor the barcode in the read with the -r option and a selection can be made by providing a start- and end coordinate via the -s and -e options. For example, if we want to search for barcodes in the first six nucleotides of a read, we use the following command:

demultiplex demux -r -e 6 barcodes.csv file.fq

## Library

`python >>> from demultiplex import >>> `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

demultiplex-0.1.1.tar.gz (5.9 kB view details)

Uploaded Source

File details

Details for the file demultiplex-0.1.1.tar.gz.

File metadata

  • Download URL: demultiplex-0.1.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for demultiplex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9abae4e507f304ff6776afec8e471420988e95d38c16b1f5c8b13b7961f4786f
MD5 b4cd0d0aff29e4255f3f998607667c71
BLAKE2b-256 10cf4a6b7e4f9b9dfd9f095b2fac9d78e498b289399059ee57edadedb4b07d84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page