Skip to main content

Python package for demultiplexing raw sequencing reads in SAM/BAM format

Project description

pydemux

pypi python-version stable-version An easy to use python tool for fast demultiplexing of SAM/BAM formated raw sequence data

Installation

pip install pydemux

or clone the repository

git clone git@github.com:dmalzl/pydemux.git

and run

cd pydemux
pip install .

verify the installation by typing

pydemux -h

Basic usage

pydemux can be used to demultiplex sequence data in SAM/BAM format for either single or paired-end reads (in case of paired-end the reads have to be interleaved). Basic command for this are listed below (you can test these using the files in the data directory)

pydemux single -b single_barcodes.tsv -o demux/ -s demux_stats.tsv single.bam
pydemux paired -b paired_barcodes.tsv -o demux/ -s demux_stats.tsv paired.bam

Note that for writing the files to a specific output directory the / needs to be included for the -o/--output_prefix commandline argument or they will be written to the current work directory. In order to compress the output one can also add the -gz/--gzip commandline argument to the command.

Changing looked up SAMfile tags from which the barcodes are read

By default the algorithm looks for the barcode in the BC tag of each read in case of pydemux single and in the BC and B2 tag of the paired-end reads in case of pydemux paired. This can be changed using either the -t/--bctag or -t1/--bc1tag and/or -t2/--bc2tag for pydemux single or pydemux paired respectively.

Speeding up processing

In order to speed up demultiplexing the algorithm can be run concurrently using the -p/--processes argument.

Optimizing yield

As with every sequencing-based data type, the barcodes are also prone to include sequencing errors. In order to optimize the read yield for each sample the algorithm allows for a given number of mismatches between the true and the sequenced barcodes which can be set with -m/--mismatches. By default only exact matches will be assigned. If you want to allow for one or more mismatches please make sure that the number of allowed mismatches does not exceed half of the minimum pairwise Hamming distance of all true barcodes minus 1 (i.e. min(pairwise_hamming_distance(true_barcodes)) // 2 - 1) since otherwise reads might be wrongly assigned

Barcode file format

The barcode file is a simple tab-separated file containing one sample and its associated barcodes per line. In case of pydemux single each line consists of one barcode and one sample name. e.g.

barcode sample_name
ACGTCGTA sample_1

In case of pydemux paired the file contains two barcodes and one sample name per line. e.g.

barcode1 barcode2 sample_name
ACGTCGTA CGTAGGAT sample_1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydemux-1.0.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydemux-1.0.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file pydemux-1.0.1.tar.gz.

File metadata

  • Download URL: pydemux-1.0.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for pydemux-1.0.1.tar.gz
Algorithm Hash digest
SHA256 970528c4aec24e38d11bf13d854b9118777850a9ae3dcf0db0e2baab28453895
MD5 761c70a1342986b0463eb2009ffcc6aa
BLAKE2b-256 1f66ebcb87082daf495b4ee758d6d151ad6a9f45eecd7a5de423fe7a8b601ef1

See more details on using hashes here.

File details

Details for the file pydemux-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pydemux-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for pydemux-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59726b86e3051f0a4f857e07e0636f1e48c96573403a78cff0848cdda9c8e560
MD5 35eb263042895b079f7b59b043dd1efb
BLAKE2b-256 0aa800adcb9b36e2e312904fa1b2fb87f330d3cd31502fc33a2c3296d948ec28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page