Skip to main content

Bioinformatics genetic barcode demultiplexing (Spatial Transcriptomics)

Project description

TagGD: Barcode Demultiplexing Utilities for Spatial Transcriptomics Data

License: MIT Python 3.10 Python 3.11 Python 3.12 PyPI version Build Status

TagGD is a Python-based barcode demultiplexer for Spatial Transcriptomics data. It provides a generalized, optimized, and up-to-date version of the original C++ demultiplexer "findIndexes," available here.

For the original peer-reviewed reference to the program, see PLOS ONE.

Overview

The primary goal of TagGD is to extract cDNA barcodes from input files (FASTQ, FASTA, SAM, or BAM) and match them against a list of reference barcodes using a k-mer-based approach. Matched reads are output with barcode and spatial information added to each record.

TagGD is versatile and can be used to demultiplex any type of index if a reference file is provided. Users can even create fake spatial coordinates (X, Y) for general-purpose demultiplexing tasks.

Key Features

  • Supports FASTQ, FASTA, SAM, and BAM formats.
  • Handles multiple indexes per read.
  • K-mer-based matching for efficient and accurate demultiplexing.
  • Outputs matched, unmatched, and ambiguous reads with annotated barcodes.
  • Multiple options and distance metrice.
  • Fast and memmory efficient.

Requirements

  • python 3.10 or higher
  • cython
  • pysam
  • numpy
  • dnaio
  • pytest (testing)

Installation

From Source

If you are using a virtual environment like Anaconda:

git clone https://github.com/your-repo/taggd.git
cd taggd
python setup.py build
python setup.py install

or using pip

git clone https://github.com/your-repo/taggd.git
cd taggd
pip install .

Using pip

Install directly from PyPI:

pip install taggd

Building the Project

If you are contributing, testing or making changes to the code, you may need to build or rebuild the Cython extensions:

python setup.py build_ext --inplace

Testing the Project

pytest

Usage

Basic Command

To see all available options, run:

taggd_demultiplex -h

Input Reference File Format

The reference file should contain barcodes and optional spatial coordinates, formatted as follows:

BARCODE X Y

Example:

ACGTACGT 0 0
TGCATGCA 1 1

Example Commands

Example

taggd_demultiplex   --k 6   --max-edit-distance 3   --overhang 2   --subprocesses 4   --seed randomseed   <barcodes.tsv>   <input_file>   <output_prefix>

Output

TagGD generates the following output files:

  • <output_prefix>_matched.*: Reads that matched reference barcodes.
  • <output_prefix>_unmatched.*: Reads that did not match any reference barcodes.
  • <output_prefix>_ambiguous.*: Reads that matched multiple barcodes.
  • <output_prefix>_results.tsv: Summary statistics of the run.

Options

Run taggd_demultiplex -h to view all available options and their descriptions.


Contact

For questions, bug reports, or contributions, please contact:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taggd-0.4.0.tar.gz (431.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

taggd-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (199.9 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file taggd-0.4.0.tar.gz.

File metadata

  • Download URL: taggd-0.4.0.tar.gz
  • Upload date:
  • Size: 431.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for taggd-0.4.0.tar.gz
Algorithm Hash digest
SHA256 452af6d66615c00de2a4a205a5cc2a22d94a259b4c99e5c16981baaf34fca18e
MD5 10e1d1ce41559b5c9cf195dd07b5abee
BLAKE2b-256 2bfb909594bd17cc81c283192643bbb8abb0b93bca53f713bd9fe951d5717732

See more details on using hashes here.

File details

Details for the file taggd-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taggd-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d2ca2cdcb4733f61b80095a9d92ffcae4681ac6bdc350dd8db829390f498162d
MD5 94d8999a5375adc1737a32a58cf5f5d1
BLAKE2b-256 fb24d4d6ae804bf1621c37a7b02a4d5c00e06db00271f7d1800912e6acef1310

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page