ST Pipeline: An automated pipeline for spatial mapping of unique transcripts
Project description
The ST Pipeline contains the tools and scripts needed to process and analyze the raw files generated with the Spatial Transcriptomics or Visium in FASTQ format to generate datasets for down-stream analysis. The ST pipeline can also be used to process single cell RNA-seq data as long as a file with barcodes identifying each cell is provided. The ST Pipeline can also process RNA-Seq datasets generated with or without UMIs.
The ST Pipeline has been optimized for speed, robustness and it is very easy to use with many parameters to adjust all the settings. The ST Pipeline is fully parallel and has constant memory use. The ST Pipeline allows to skip any of the steps and to use the genome or the transcriptome as reference.
The following files/parameters are commonly required:
FASTQ files (Read 1 containing the spatial information and the UMI and read 2 containing the genomic sequence)
A genome index generated with STAR
An annotation file in GTF or GFF format (optional)
- The file containing the barcodes and array coordinates
(look at the folder “ids” and chose the correct one). Basically this file contains 3 columns (BARCODE, X and Y), so if you provide this file with barcodes identinfying cells (for example), the ST pipeline can be used for single cell data. This file is optional too.
A name for the dataset
The ST pipeline has multiple parameters mostly related to trimming, mapping and annotation but generally the default values are good enough. You can see a full description of the parameters typing “st_pipeline_run.py –help” after you have installed the ST pipeline.
The input FASTQ files can be given in gzip/bzip format as well.
Basically what the ST pipeline does is:
- Quality trimming (read 1 and read 2):
Remove low quality bases
Sanity check (reads same length, reads order, etc..)
Check quality UMI (if provided)
Remove artifacts (PolyT, PolyA, PolyG, PolyN and PolyC) of user defined length
Check for AT and GC content
Discard reads with a minimum number of bases of that failed any of the checks above
Contamimant filter e.x. rRNA genome (Optional)
Mapping with STAR (only read 2)
Demultiplexing with [Taggd](https://github.com/SpatialTranscriptomicsResearch/taggd) (only read 1)
Keep reads (read 2) that contain a valid barcode and are correctly mapped
Annotate the reads with htseq-count (optional)
Group annotated reads by barcode(spot position) and gene to get a read count
In the grouping/counting only unique molecules (UMIs) are kept.
You can see a graphical more detailed description of the workflow in the documents workflow.pdf and workflow_extended.pdf
The output will be a matrix of counts (genes as columns, spots as rows), The ST pipeline will also output a log file with useful information and stats.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for stpipeline-1.8.1-py3.7-macosx-10.9-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15fbf6984eafbab980a598921eabb615bd51412d49dbcd455517fecc5a289e87 |
|
MD5 | b4a218f8a1e9515b3448eb1c536b2f70 |
|
BLAKE2b-256 | 411a5220226e668e98f5a0ab1fbdb92bfd836c965678909cc7d963dea137152b |
Hashes for stpipeline-1.8.1-py3.6-macosx-10.9-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 478f556b663f6022c6be7a29a8687aa46c3b65c36fa08f8534f6b73ba43b4479 |
|
MD5 | 019be15551220f5c720d04a6618a6417 |
|
BLAKE2b-256 | 179b02916fa37520a3c21984d61476095d8d453793dc560661fcef04144be553 |