Skip to main content

A bioinformatic pipeline for the analysis of spatial transcriptomic data

Project description

Spatial transcriptomics sequencing

Structure of the pipeline

This repository collects all scripts and tools used for analyzing the sequencing side of the spatial transcriptomics datasets. The following steps are currently performed:

Demultiplex the data

This assumes that the sample sheet has been provided and that the raw data has been copied to the basecalls folder. The tool bcl2fastq is used to demultiplex the data.

Rename the fastq files

It is important to rename the .fastq files so that the namings are meaningful.

Reverse the fastq files

Read 1 needs to be reversed to match the barcodes of the optical side.

Run FastQC on the fastq files

Run it on all files. Do QC.

Run the sequencing analysis pipeline

After this the sequences are analyzed. It needs to be provided the (i) species to map onto and (ii) the filename of the sample.

Produce the QC sheet of the sequencing data

After everything is finished, a python script is being run to produce the QC sheet for the sample. There's the qc_sequencing_parameters.yaml file which contains metadata for the experiment/sample and currently needs to be created automatically. Could be automized, with taking info partially from the sample sheet.

Snakemake

The pipeline is implemented in snakemake. All metadata of the experiments (experiment_name, flowcell_id, species, etc) should be put in a config.yaml file. An example config.yaml file is in the root of this repo.

To run the snakemake script, the snakemake python library is required (installed with pip or conda). The script requires at least 6 threads to run, this is due to pipeing several commands one after the other to descrease runtime.

Example run:

snakemake --snakefile path_to_snakefile --configfile path_to_configfile.

This will create the output in the directory in which the command is run. Note, that all samplesheet-flowcell_id paris should be ideally in one configfile somewhere.

Produced directory structure

The following directory structure will be produced by the snakemake file

    .
    |-- demultiplex_data                            # demultiplexed data folders, one per each samplesheet
    |   |-- 200110_STS_017_4-7-8STS_018_1           # directory names are identical to the samplesheet names
    |   |   |-- Stats
    |   |   |-- sts_017
    |   |   |-- sts_018
    |   |   |-- Undetermined_S0_R1_001.fastq.gz
    |   |   `-- Undetermined_S0_R2_001.fastq.gz
    |   `-- 20191206_spatseq_smples3-4
    |       |-- indicator.log
    |       |-- Reports
    |       |-- Stats
    |       |-- sts_0xxx
    |       |-- Undetermined_S0_R1_001.fastq.gz
    |       `-- Undetermined_S0_R2_001.fastq.gz
    |-- sts_017                                     # root output directory, one per project
    |   |-- data
    |   |   |-- sts_017_4                           # directory containing results of running the pipeline. one per sample 
    |   |   |-- sts_017_7
    |   |   `-- sts_017_8
    |   `-- reads                                   # reads directory, one per sample
    |       |-- fastqc
    |       |-- raw
    |       `-- reversed
    |-- sts_018
    |   |-- data
    |   |   `-- sts_018_1
    |   `-- reads
    |       |-- fastqc
    |       |-- raw
    |       `-- reversed
    `-- sts_0xxx
        |-- data
        |   |-- sts_01
        |   `-- sts_02
        `-- reads
            |-- fastqc
            |-- raw
            `-- reversed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacemake-mjens-0.0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

spacemake_mjens-0.0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file spacemake-mjens-0.0.1.tar.gz.

File metadata

  • Download URL: spacemake-mjens-0.0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.3

File hashes

Hashes for spacemake-mjens-0.0.1.tar.gz
Algorithm Hash digest
SHA256 78f5dd02cae83467ee272c174a233df705c8d201f7fd4247f86d919e2fdab556
MD5 bfb425d1b582664fd68b96c01409b840
BLAKE2b-256 95c84ab644660f488cce668004086bf61a84e78dd8e5f95ed4b70f9800e6c519

See more details on using hashes here.

File details

Details for the file spacemake_mjens-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: spacemake_mjens-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.3

File hashes

Hashes for spacemake_mjens-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6bd2d29f446f61d790d03dbe8790055ba5921519abff5759604b4954024d54be
MD5 a895138186bed3815853ea295ce4569c
BLAKE2b-256 7fb56b6511bc16984551b7dd5a1f61969e0b6492514fd39e8c65506adf22b327

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page