Skip to main content

Bioinformatical tool to remove sequencing artifacts originating from single-nucleotide errors and index hopping from barcode-based experiments

Project description

PhantomBuster is a bioinformatical tool that removes phantom barcode combinations that occur due to single-nucleotide sequencing errors and index hopping. It is written for lineage-tracing experiments and CRISPR-screens, but can be used for any experimental setups in which only barcodes and no genetic DNA is measured.

Installation

PhantomBuster is available via pypi and can be installed with standard python tools like pip or pipx.

pipx install phantombuster

QuickStart

PhantomBuster is a command line tool which can be run via the phantombuster command. It consists of four main steps: (1) demultplexing, (2) error correction of random barcodes, (3) hopping removal and (4) thresholding. For CRISPR-screens a separate script can calculate p-values for guides.

Demultiplexing

PhantomBuster demultiplexes BAM or FASTQ files, extracts all specified barcodes while error correcting barcodes with known reference sequences. For demultiplexing additional worker processes must be started.

phantombuster demultiplex [INPUTFILE] --outdir [DIR] --barcode-hierarchy-file [FILE] --regex-file [FILE] 
phantombuster worker --outdir [DIR]

INPUTFILE must be a csv file that lists all BAM and FASTQ files that are processed.

Example input_files.csv:

file
101.bam

The barcode hierarchy file is a csv file that lists all barcodes to be extracted. The order of the barcodes creates a hierarchy, in which barcodes higher up the hierarchy are more general, while barcodes lower in the hierarchy are more specific. The hierarchy is used in the second step error correction, in which two random barcode sequences are only compared, if all barcode sequences higher up the hierarchy are the same. Example barcode_hierarchy.csv

barcode,type,referencefile,threshold,min_length,max_length
sample,reference,sample_barcodes.csv,auto,-,-
lib,reference,library_barcodes.csv,1,-,-
lid,random,-,-,50,50

The regex file is a csv file that specifies for each read region how to extract barcodes by a regular expression.

Example regexes.csv:

tag,regex
b2,"^[ACGTN]{3}(?P<sample>[ACGTN]{5})"
query,"(?P<lid>[ACGTN]{5,6}(?P<lib>ACGT|GTAC){s<=1}[ACGTN]+)"

The outdir is a directory that contains all output and temporary files. The same out directory must be passed to all stages of phantombuster.

Error Correction

The error correction step employs the UMI-tools error correction algorithm to error correct random barcode sequences. For error correction additional worker processes must be started.

phantombuster error-correct --outdir [DIR] --barcode-hierarchy-file [FILE]
phantombuster worker --outdir [DIR]

The out dir and barcode hierarchy file must be the same as in the demultiplexing step.

Index Hopping Removal

The index hopping removal step removes barcode combinations that likely arised due to index hopping. For index hopping removal no worker processes need to be started.

phantombuster hopping-removal --outdir [DIR] [HOPPING_BARCODES]

The out dir must be the same as in the previous steps. The barcodes to test must correspond to one or a combination of barcodes of the barcode hierarchy (sample). Combinations can be given seperated by commas (sample,lid). Multiple barcodes or combinations can be given and are then processed one after another (sample,lid lib) It is recommended to test the combination of barcodes on the i5 index and then the combination of barcodes on the i7 index.

Thresholding

Thresholding removes all barcode combinations with a read count below a user defined threshold. Seperate Thresholds can be chosen for different values of barcodes, for example for each sample. For tresholding no worker processes need to be started.

phantombuster threshold --outdir [DIR] --threshold-file [FILE]

Example thresholds.csv:

sample,threshold
sample1,10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantombuster-0.14.0.tar.gz (54.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

phantombuster-0.14.0-cp312-cp312-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

phantombuster-0.14.0-cp311-cp311-manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

phantombuster-0.14.0-cp310-cp310-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

phantombuster-0.14.0-cp39-cp39-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file phantombuster-0.14.0.tar.gz.

File metadata

  • Download URL: phantombuster-0.14.0.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.8.9-arch1-1

File hashes

Hashes for phantombuster-0.14.0.tar.gz
Algorithm Hash digest
SHA256 6843572a8799e0005cff7f8d3096bfb40f0448446ed622dd0467563ec19bcace
MD5 ef45cc93c601fe527626d3522902f88a
BLAKE2b-256 2a0f58cd034c185fb3adbe97797eb8535f9406d0789e15b85fa32ed4e4ddefb0

See more details on using hashes here.

File details

Details for the file phantombuster-0.14.0-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.14.0-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f4513f555c1c33de3b62d79a72ea95b16c64107f122b98dec0667b3fa9ba89f7
MD5 02a838bda38686e2c3d87fc95eb049f8
BLAKE2b-256 4daccfc0539b6e12687fb4e10a837f3a37ebd50d1cad962de80336419a877913

See more details on using hashes here.

File details

Details for the file phantombuster-0.14.0-cp311-cp311-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.14.0-cp311-cp311-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a58dc3053dcbeb88f32d3fec44e8563898d5609290e2b41a52728c88baf2f8bd
MD5 d34c23883cfb0079089a5a338dc5b71b
BLAKE2b-256 67caaab4485f18cc9b014d8175d0ac22983923953286da29074d9edd1b1082ee

See more details on using hashes here.

File details

Details for the file phantombuster-0.14.0-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.14.0-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 dc3c3006b01a4aa5e680ba31a59c46f35e8c59a311fcefcc4f195709adc7d916
MD5 3c8cd852ee8b76171e516a1bf833eb3b
BLAKE2b-256 32bed15c3734226e0a19dccedb14746c57807382823b12fe0aee810e8a752643

See more details on using hashes here.

File details

Details for the file phantombuster-0.14.0-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.14.0-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c36218db5647ecc4d130ae65337d9c7f745587f48cb592080a4766d27f94f9fc
MD5 ab384bf4ed2e6ef36850f3fb8bf15dd8
BLAKE2b-256 7ff3dbf1dc811d125056878c35cee77765bb0e4e76643c314038f4334e977ef6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page