Skip to main content

ST Pipeline: An automated pipeline for spatial mapping of unique transcripts

Project description

The ST Pipeline contains the tools and scripts needed to process and analyze the raw files generated with the Spatial Transcriptomics or Visium in FASTQ format to generate datasets for down-stream analysis. The ST pipeline can also be used to process single cell RNA-seq data as long as a file with barcodes identifying each cell is provided. The ST Pipeline can also process RNA-Seq datasets generated with or without UMIs.

The ST Pipeline has been optimized for speed, robustness and it is very easy to use with many parameters to adjust all the settings. The ST Pipeline is fully parallel and has constant memory use. The ST Pipeline allows to skip any of the steps and to use the genome or the transcriptome as reference.

The following files/parameters are commonly required:

  • FASTQ files (Read 1 containing the spatial information and the UMI and read 2 containing the genomic sequence)

  • A genome index generated with STAR

  • An annotation file in GTF or GFF format (optional)

  • The file containing the barcodes and array coordinates

    (look at the folder “ids” and chose the correct one). Basically this file contains 3 columns (BARCODE, X and Y), so if you provide this file with barcodes identinfying cells (for example), the ST pipeline can be used for single cell data. This file is optional too.

  • A name for the dataset

The ST pipeline has multiple parameters mostly related to trimming, mapping and annotation but generally the default values are good enough. You can see a full description of the parameters typing “st_pipeline_run.py –help” after you have installed the ST pipeline.

The input FASTQ files can be given in gzip/bzip format as well.

Basically what the ST pipeline does is:

  • Quality trimming (read 1 and read 2):
    • Remove low quality bases

    • Sanity check (reads same length, reads order, etc..)

    • Check quality UMI (if provided)

    • Remove artifacts (PolyT, PolyA, PolyG, PolyN and PolyC) of user defined length

    • Check for AT and GC content

    • Discard reads with a minimum number of bases of that failed any of the checks above

  • Contamimant filter e.x. rRNA genome (Optional)

  • Mapping with STAR (only read 2)

  • Demultiplexing with [Taggd](https://github.com/SpatialTranscriptomicsResearch/taggd) (only read 1)

  • Keep reads (read 2) that contain a valid barcode and are correctly mapped

  • Annotate the reads with htseq-count (optional)

  • Group annotated reads by barcode(spot position) and gene to get a read count

  • In the grouping/counting only unique molecules (UMIs) are kept.

You can see a graphical more detailed description of the workflow in the documents workflow.pdf and workflow_extended.pdf

The output will be a matrix of counts (genes as columns, spots as rows), The ST pipeline will also output a log file with useful information and stats.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stpipeline-1.8.1.tar.gz (29.5 MB view details)

Uploaded Source

Built Distributions

stpipeline-1.8.1-py3.7-macosx-10.9-x86_64.egg (233.8 kB view details)

Uploaded Source

stpipeline-1.8.1-py3.6-macosx-10.9-x86_64.egg (231.8 kB view details)

Uploaded Source

File details

Details for the file stpipeline-1.8.1.tar.gz.

File metadata

  • Download URL: stpipeline-1.8.1.tar.gz
  • Upload date:
  • Size: 29.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.1

File hashes

Hashes for stpipeline-1.8.1.tar.gz
Algorithm Hash digest
SHA256 930771c8de7e57e53aecd308dd3edf30486590673652637ebc586d27c8135941
MD5 494233716cb9a79dab5da27d4450f5f6
BLAKE2b-256 72e70d3a5029f3fbb11d63fa988a56ec610b2e39be137719c2268c9aa21210d4

See more details on using hashes here.

File details

Details for the file stpipeline-1.8.1-py3.7-macosx-10.9-x86_64.egg.

File metadata

  • Download URL: stpipeline-1.8.1-py3.7-macosx-10.9-x86_64.egg
  • Upload date:
  • Size: 233.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.1

File hashes

Hashes for stpipeline-1.8.1-py3.7-macosx-10.9-x86_64.egg
Algorithm Hash digest
SHA256 15fbf6984eafbab980a598921eabb615bd51412d49dbcd455517fecc5a289e87
MD5 b4a218f8a1e9515b3448eb1c536b2f70
BLAKE2b-256 411a5220226e668e98f5a0ab1fbdb92bfd836c965678909cc7d963dea137152b

See more details on using hashes here.

File details

Details for the file stpipeline-1.8.1-py3.6-macosx-10.9-x86_64.egg.

File metadata

  • Download URL: stpipeline-1.8.1-py3.6-macosx-10.9-x86_64.egg
  • Upload date:
  • Size: 231.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.1

File hashes

Hashes for stpipeline-1.8.1-py3.6-macosx-10.9-x86_64.egg
Algorithm Hash digest
SHA256 478f556b663f6022c6be7a29a8687aa46c3b65c36fa08f8534f6b73ba43b4479
MD5 019be15551220f5c720d04a6618a6417
BLAKE2b-256 179b02916fa37520a3c21984d61476095d8d453793dc560661fcef04144be553

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page