Bioinformatics genetic barcode demultiplexing (Spatial Transcriptomics)
Project description
TagGD: Barcode Demultiplexing Utilities for Spatial Transcriptomics Data
TagGD is a Python-based barcode demultiplexer for Spatial Transcriptomics data. It provides a generalized, optimized, and up-to-date version of the original C++ demultiplexer "findIndexes," available here.
For the original peer-reviewed reference to the program, see PLOS ONE.
Overview
The primary goal of TagGD is to extract cDNA barcodes from input files (FASTQ, FASTA, SAM, or BAM) and match them against a list of reference barcodes using a k-mer-based approach. Matched reads are output with barcode and spatial information added to each record.
TagGD is versatile and can be used to demultiplex any type of index if a reference file is provided. Users can even create fake spatial coordinates (X, Y) for general-purpose demultiplexing tasks.
Key Features
- Supports FASTQ, FASTA, SAM, and BAM formats.
- Handles multiple indexes per read.
- K-mer-based matching for efficient and accurate demultiplexing.
- Outputs matched, unmatched, and ambiguous reads with annotated barcodes.
- Multiple options and distance metrice.
- Fast and memmory efficient.
Requirements
- python 3.10 or higher
- cython
- pysam
- numpy
- dnaio
- pytest (testing)
Installation
From Source
If you are using a virtual environment like Anaconda:
git clone https://github.com/your-repo/taggd.git
cd taggd
python setup.py build
python setup.py install
or using pip
git clone https://github.com/your-repo/taggd.git
cd taggd
pip install .
Using pip
Install directly from PyPI:
pip install taggd
Building the Project
If you are contributing, testing or making changes to the code, you may need to build or rebuild the Cython extensions:
python setup.py build_ext --inplace
Testing the Project
pytest
Usage
Basic Command
To see all available options, run:
taggd_demultiplex -h
Input Reference File Format
The reference file should contain barcodes and optional spatial coordinates, formatted as follows:
BARCODE X Y
Example:
ACGTACGT 0 0
TGCATGCA 1 1
Example Commands
Example
taggd_demultiplex --k 6 --max-edit-distance 3 --overhang 2 --subprocesses 4 --seed randomseed <barcodes.tsv> <input_file> <output_prefix>
Output
TagGD generates the following output files:
<output_prefix>_matched.*: Reads that matched reference barcodes.<output_prefix>_unmatched.*: Reads that did not match any reference barcodes.<output_prefix>_ambiguous.*: Reads that matched multiple barcodes.<output_prefix>_results.tsv: Summary statistics of the run.
Options
Run taggd_demultiplex -h to view all available options and their descriptions.
Contact
For questions, bug reports, or contributions, please contact:
- Jose Fernandez Navarro: jc.fernandez.navarro@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file taggd-0.4.0.tar.gz.
File metadata
- Download URL: taggd-0.4.0.tar.gz
- Upload date:
- Size: 431.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
452af6d66615c00de2a4a205a5cc2a22d94a259b4c99e5c16981baaf34fca18e
|
|
| MD5 |
10e1d1ce41559b5c9cf195dd07b5abee
|
|
| BLAKE2b-256 |
2bfb909594bd17cc81c283192643bbb8abb0b93bca53f713bd9fe951d5717732
|
File details
Details for the file taggd-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: taggd-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 199.9 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2ca2cdcb4733f61b80095a9d92ffcae4681ac6bdc350dd8db829390f498162d
|
|
| MD5 |
94d8999a5375adc1737a32a58cf5f5d1
|
|
| BLAKE2b-256 |
fb24d4d6ae804bf1621c37a7b02a4d5c00e06db00271f7d1800912e6acef1310
|