Skip to main content

Bioinformatical tool to remove sequencing artifacts originating from single-nucleotide errors and index hopping from barcode-based experiments

Project description

PhantomBuster is a bioinformatical tool that removes phantom barcode combinations that occur due to single-nucleotide sequencing errors and index hopping. It is written for lineage-tracing experiments and CRISPR-screens, but can be used for any experimental setups in which only barcodes and no genetic DNA is measured.

Installation

PhantomBuster is available via pypi and can be installed with standard python tools like pip or pipx.

pipx install phantombuster

QuickStart

PhantomBuster is a command line tool which can be run via the phantombuster command. It consists of four main steps: (1) demultplexing, (2) error correction of random barcodes, (3) hopping removal and (4) thresholding. For CRISPR-screens a separate script can calculate p-values for guides.

Demultiplexing

PhantomBuster demultiplexes BAM or FASTQ files, extracts all specified barcodes while error correcting barcodes with known reference sequences. For demultiplexing additional worker processes must be started.

phantombuster demultiplex [INPUTFILE] --outdir [DIR] --barcode-hierarchy-file [FILE] --regex-file [FILE] 
phantombuster worker --outdir [DIR]

INPUTFILE must be a csv file that lists all BAM and FASTQ files that are processed.

Example input_files.csv:

file
101.bam

The barcode hierarchy file is a csv file that lists all barcodes to be extracted. The order of the barcodes creates a hierarchy, in which barcodes higher up the hierarchy are more general, while barcodes lower in the hierarchy are more specific. The hierarchy is used in the second step error correction, in which two random barcode sequences are only compared, if all barcode sequences higher up the hierarchy are the same. Example barcode_hierarchy.csv

barcode,type,referencefile,threshold,min_length,max_length
sample,reference,sample_barcodes.csv,auto,-,-
lib,reference,library_barcodes.csv,1,-,-
lid,random,-,-,50,50

The regex file is a csv file that specifies for each read region how to extract barcodes by a regular expression.

Example regexes.csv:

tag,regex
b2,"^[ACGTN]{3}(?P<sample>[ACGTN]{5})"
query,"(?P<lid>[ACGTN]{5,6}(?P<lib>ACGT|GTAC){s<=1}[ACGTN]+)"

The outdir is a directory that contains all output and temporary files. The same out directory must be passed to all stages of phantombuster.

Error Correction

The error correction step employs the UMI-tools error correction algorithm to error correct random barcode sequences. For error correction additional worker processes must be started.

phantombuster error-correct --outdir [DIR] --barcode-hierarchy-file [FILE]
phantombuster worker --outdir [DIR]

The out dir and barcode hierarchy file must be the same as in the demultiplexing step.

Index Hopping Removal

The index hopping removal step removes barcode combinations that likely arised due to index hopping. For index hopping removal no worker processes need to be started.

phantombuster hopping-removal --outdir [DIR] [HOPPING_BARCODES]

The out dir must be the same as in the previous steps. The barcodes to test must correspond to one or a combination of barcodes of the barcode hierarchy (sample). Combinations can be given seperated by commas (sample,lid). Multiple barcodes or combinations can be given and are then processed one after another (sample,lid lib) It is recommended to test the combination of barcodes on the i5 index and then the combination of barcodes on the i7 index.

Thresholding

Thresholding removes all barcode combinations with a read count below a user defined threshold. Seperate Thresholds can be chosen for different values of barcodes, for example for each sample. For tresholding no worker processes need to be started.

phantombuster threshold --outdir [DIR] --threshold-file [FILE]

Example thresholds.csv:

sample,threshold
sample1,10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantombuster-0.16.2.tar.gz (56.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

phantombuster-0.16.2-cp312-cp312-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

phantombuster-0.16.2-cp311-cp311-manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

phantombuster-0.16.2-cp310-cp310-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

phantombuster-0.16.2-cp39-cp39-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file phantombuster-0.16.2.tar.gz.

File metadata

  • Download URL: phantombuster-0.16.2.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Linux/6.6.72-1-lts

File hashes

Hashes for phantombuster-0.16.2.tar.gz
Algorithm Hash digest
SHA256 9370268d2f753e519755e76880e3ad310882db427ee6969a29efba40b1106eb9
MD5 22d8d013ef91833f36f574277f679f9c
BLAKE2b-256 4bb11c8dc18d4fe55e7a0634006e06f4971d51339e4715dbeb529f21cf65cc24

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.2-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.2-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 dde0e18a378cead065f39158a63fd391af9ca76afbc1da268783afcc7433fc8d
MD5 fca11b78eb63f9ede29d4a6a1c730317
BLAKE2b-256 c13fca76e91500bc6b1253f1bae67c8dc714ba562f50ab9072f4c19023fac0f0

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.2-cp311-cp311-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.2-cp311-cp311-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ce95a24b67f71169242ea5a8293de6cf36ae1eb4310ea4d141143d147fe17be8
MD5 205a1b801de31cf6c65c7829db6d6226
BLAKE2b-256 c4604818c7e30c57743a960e2994fe96b952c05fc22a4e9f5a6469d2edaeed88

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.2-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.2-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 3c5689fd66b7ebb39786f5cce1897f6fc4ea0635c96fe8c4080fc8c6300f6668
MD5 ff23aef29f6270597947076861954abe
BLAKE2b-256 0c751fce639de1ac53287e6edb47dffc159cd781da0f792eda5c41ddc582003a

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.2-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.2-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 676d56e0ac0f2e8d9eb6ed56bf4e4a9c301c12923909576ff9e23f57999db31d
MD5 27d5b39348bbaf9add082f5d86ca4b42
BLAKE2b-256 532925e0c1b65f09bfe843e7344b3ff6ac02ae731ef3e236e42690ed9f5cba5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page