Skip to main content

Bioinformatical tool to remove sequencing artifacts originating from single-nucleotide errors and index hopping from barcode-based experiments

Project description

PhantomBuster is a bioinformatical tool that removes phantom barcode combinations that occur due to single-nucleotide sequencing errors and index hopping. It is written for lineage-tracing experiments and CRISPR-screens, but can be used for any experimental setups in which only barcodes and no genetic DNA is measured.

Installation

PhantomBuster is available via pypi and can be installed with standard python tools like pip or pipx.

pipx install phantombuster

QuickStart

PhantomBuster is a command line tool which can be run via the phantombuster command. It consists of four main steps: (1) demultplexing, (2) error correction of random barcodes, (3) hopping removal and (4) thresholding. For CRISPR-screens a separate script can calculate p-values for guides.

Demultiplexing

PhantomBuster demultiplexes BAM or FASTQ files, extracts all specified barcodes while error correcting barcodes with known reference sequences. For demultiplexing additional worker processes must be started.

phantombuster demultiplex [INPUTFILE] --outdir [DIR] --barcode-hierarchy-file [FILE] --regex-file [FILE] 
phantombuster worker --outdir [DIR]

INPUTFILE must be a csv file that lists all BAM and FASTQ files that are processed.

Example input_files.csv:

file
101.bam

The barcode hierarchy file is a csv file that lists all barcodes to be extracted. The order of the barcodes creates a hierarchy, in which barcodes higher up the hierarchy are more general, while barcodes lower in the hierarchy are more specific. The hierarchy is used in the second step error correction, in which two random barcode sequences are only compared, if all barcode sequences higher up the hierarchy are the same. Example barcode_hierarchy.csv

barcode,type,referencefile,threshold,min_length,max_length
sample,reference,sample_barcodes.csv,auto,-,-
lib,reference,library_barcodes.csv,1,-,-
lid,random,-,-,50,50

The regex file is a csv file that specifies for each read region how to extract barcodes by a regular expression.

Example regexes.csv:

tag,regex
b2,"^[ACGTN]{3}(?P<sample>[ACGTN]{5})"
query,"(?P<lid>[ACGTN]{5,6}(?P<lib>ACGT|GTAC){s<=1}[ACGTN]+)"

The outdir is a directory that contains all output and temporary files. The same out directory must be passed to all stages of phantombuster.

Error Correction

The error correction step employs the UMI-tools error correction algorithm to error correct random barcode sequences. For error correction additional worker processes must be started.

phantombuster error-correct --outdir [DIR] --barcode-hierarchy-file [FILE]
phantombuster worker --outdir [DIR]

The out dir and barcode hierarchy file must be the same as in the demultiplexing step.

Index Hopping Removal

The index hopping removal step removes barcode combinations that likely arised due to index hopping. For index hopping removal no worker processes need to be started.

phantombuster hopping-removal --outdir [DIR] [HOPPING_BARCODES]

The out dir must be the same as in the previous steps. The barcodes to test must correspond to one or a combination of barcodes of the barcode hierarchy (sample). Combinations can be given seperated by commas (sample,lid). Multiple barcodes or combinations can be given and are then processed one after another (sample,lid lib) It is recommended to test the combination of barcodes on the i5 index and then the combination of barcodes on the i7 index.

Thresholding

Thresholding removes all barcode combinations with a read count below a user defined threshold. Seperate Thresholds can be chosen for different values of barcodes, for example for each sample. For tresholding no worker processes need to be started.

phantombuster threshold --outdir [DIR] --threshold-file [FILE]

Example thresholds.csv:

sample,threshold
sample1,10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantombuster-0.16.1.tar.gz (56.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

phantombuster-0.16.1-cp312-cp312-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

phantombuster-0.16.1-cp311-cp311-manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

phantombuster-0.16.1-cp310-cp310-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

phantombuster-0.16.1-cp39-cp39-manylinux_2_17_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file phantombuster-0.16.1.tar.gz.

File metadata

  • Download URL: phantombuster-0.16.1.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Linux/6.6.72-1-lts

File hashes

Hashes for phantombuster-0.16.1.tar.gz
Algorithm Hash digest
SHA256 d727c1cfd1578875c89e600aa2de0727946fee1b14823e32a85f1fed1602634e
MD5 9eb14205fb5808104053224573883262
BLAKE2b-256 76b4cb489cac821e1bd4c039e45511b7f4d7434ae9aa36656e3fed10cbf4ef6f

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.1-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.1-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 1b0a78f7a0ae7f466a84f8f0dc51347750ab46ae38e4b467c49a4ca9971078c6
MD5 8b361cc775ea54ec8a64c4b108fae40d
BLAKE2b-256 ddc0df062296fe9597866a3cb1c553e5b67b94584f26c77a75b7041d126766c2

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.1-cp311-cp311-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.1-cp311-cp311-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 8eb210e9cec7443cb042e1b0f5fb546383a0055caa357de5a7e67c2d2eaf29d8
MD5 eef7b80b90106b817504e835f4ea4605
BLAKE2b-256 1ebf53e8ff500e0524ffe6b68cca99c3b670a7e402bad73602479cbd774c1465

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.1-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.1-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 34003f696140ed88e0be40ba04a0cd574935dbee4eb8ff4a4ddbd5e8301505af
MD5 925ed6e90071290af6d82c0192106fcc
BLAKE2b-256 a5fb4fc7a291a59f80caf000f3e518520620c771b91e96192c76dbefb873c335

See more details on using hashes here.

File details

Details for the file phantombuster-0.16.1-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for phantombuster-0.16.1-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c8a568a5819c484c46d2c243d0b6d61fd4595eb0b059cc661b6af8f0a2eee909
MD5 c0757f5ba8130e15417d1c319d3f7db5
BLAKE2b-256 75f05f3b579717b8b55e3679ad933c549ce462fc6b30933b0b1ea303cf1b6b87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page