Skip to main content

A pipeline to identify pathogenic microorganisms from scRNA-seq raw data

Project description

Build Status PYPI The MIT License

PathogenTrack

PathogenTrack is an unsupervised computational software that uses unmapped single-cell RNAseq reads to characterize intracellular pathogens at the single-cell level. It is a python-based script that can be used to identify and quantify intracellular pathogenic viruses and bacteria reads at the single-cell level. PathogenTrack has been tested on various scRNA-seq datasets derived from simulated and real datasets and performed robustly. The detailes are described in our paper Decoding Intracellular Pathogens of scRNA-seq experiments with PathogenTrack and SCKIT.

System Requirements

PathogenTrack has been tested on Linux platform with CentOS 7 operation system. The RAM is 120 GB, with 40 computational threads.

Installation

PathogenTrack can be installed in two steps:

1 . Installing Miniconda on Linux Platform. For details, please refer to Miniconda Installation.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

2 . Installing PathogenTrack.

conda env create -f environment.yml

Users can install the dependencies manually. The dependencies and test versions are listed below.

Package Version
python 3.6.10
biopython 1.78
star 2.7.5a
umi_tools 1.1.1
kraken2 2.1.1

Databases Preparation

1. Prepare the Human genome database

Download the Human GRCh38 genome and genome annotation file, and then decompress them:

wget ftp://ftp.ensembl.org/pub/release-101/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
gzip -d Homo_sapiens.GRCh38.dna.toplevel.fa.gz
wget ftp://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/Homo_sapiens.GRCh38.101.gtf.gz
gzip -d Homo_sapiens.GRCh38.101.gtf.gz

Build STAR Index with the following command:

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ./ \
     --genomeFastaFiles ./Homo_sapiens.GRCh38.dna.toplevel.fa \
     --sjdbGTFfile ./Homo_sapiens.GRCh38.101.gtf \
     --sjdbOverhang 100

2. Prepare Kraken2 database

wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz
tar zxf minikraken_8GB_202003.tgz

How to use PathogenTrack?

Before running this tutorial, you should run cellranger or alevin to get the single cells' gene expression matrix. Here, we take the simulated 10X sequencing data as an example:

First, we use cellranger to get scRNA-seq expression matrix and valid barcodes:

cellranger count --id cellranger_out --transcriptom /path/to/cellranger_database/

Then we run PathogenTrack to identify and quantify pathogen expression at the single-cell level:

conda activate PathogenTrack
python PathogenTrack.py count --project_id PathogenTrack_out --pattern CCCCCCCCCCCCCCCCNNNNNNNNNN \
                              --min_reads 10 --confidence 0.11 --star_index ~/database/STAR_index/ \
                              --kraken_db ~/database/minikraken_8GB_20200312/ --barcode barcodes.tsv \
                              --read1 simulation_S1_L001_R1_001.fastq.gz \
                              --read2 simulation_S1_L001_R2_001.fastq.gz 

IMPORTANT: The Read 1 in the example is made up of 16 bp CB and 10 bp UMI, so the --pattern is CCCCCCCCCCCCCCCCNNNNNNNNNN (16C and 10N). Users must adjust the pattern with their own Read 1 accordingly.

Note: It may take 4-6 hours to complete one sample, and it depends on the performance of computational resources and the size of the raw single-cell data.

Please see QUICK_START.md for a complete tutorial.

Questions

For questions and suggestions about the pipeline or the code, please contact admin@ncrna.net and ty12260@rjh.com.cn. We will try our best to provide support, address new issues, and keep improving this software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PathogenTrack-0.2.3.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

PathogenTrack-0.2.3-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file PathogenTrack-0.2.3.tar.gz.

File metadata

  • Download URL: PathogenTrack-0.2.3.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.5

File hashes

Hashes for PathogenTrack-0.2.3.tar.gz
Algorithm Hash digest
SHA256 169cca4ce348e9c2ea2c4697e3756640e426424e2fbc16af97687ec074dd53fa
MD5 5f54b71b721090aaafaf5636f59b91fd
BLAKE2b-256 964df708074d26194ff543292ff75faeb28deb489a0231e147ee3639e3c45977

See more details on using hashes here.

File details

Details for the file PathogenTrack-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: PathogenTrack-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.5

File hashes

Hashes for PathogenTrack-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2953cd6ae04266f75768c5e91cd3db76c1ad176634e44a073a2c4e92a5b8e792
MD5 0aed80af770bedaf98f51ed5f6175ea7
BLAKE2b-256 7cbc0d957d20ec847f8309b71002837e5f33698c0ecfd7696afe127c8595f05d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page