Skip to main content

spaceranger wrangling tools for Oxford Nanopore Technologies' data

Reason this release was yanked:

Just testing release pipeline

Project description

Percula

Percula is a Python package to provide a shim between spatial single-cell data output from Oxford Nanopore Technologies' sequencing devices and 10X Genomics' Space Ranger.

At the time of writing, Space Ranger does not natively support long-read sequencing data from Nanopore devices. Percula provides a way to convert the output of the MinKNOW device software into a format that can be ingested by Space Ranger, primarily in order to obtain cell and UMI barcodes for long-read sequencing data. This information can then be fed into wf-single-cell for long-read single-cell analysis.

Installation

Percula can be obtained as either a conda or pip package. For conda, it can be installed with:

conda create -n percula -c conda-forge -c bioconda -c nanoporetech percula
conda activate percula

Usage

The primary function of Percula is to convert the output of MinKNOW into a format that can be handled by Space Ranger. Its secondary function (because it takes over from other parts of wf-single-cell), is to perform dechimerisation of reads and read trimming.

Running Percula can be done with the following command:

percula preprocess <output> <inputs> ...

where <output> is the path where the output files will be written, and <inputs> are the input files to be processed. The inputs may either be single BAM files, or directories. If directories are provided, they will be searched recursively for BAM files.

See the Onward Processing section below for information on how to use the output files with Space Ranger and wf-single-cell.

For additional support running Percula, please contact Oxford Nanopore Support. It may speed your support request by noting the request is for the attention of the Customer Workflows team.

Fastq Inputs

Although Percula primarily works with BAM files, it can also be used with FASTQ files through the use of fastcat. Fastcat is used to aggregate files whilst preserving metadata information from either the MinKNOW device software, or the dorado basecaller (which write metadata in slightly different ways).

Note: do not use samtools import to aggregate FASTQ files, as metadata may not be preserved correctly when converting to BAM.

To use Percula with FASTQ files, you can run the following command:

fastcat --bam_out --threads 4 --recurse <inputs> ...  \
    | percula preprocess <path_to_output_directory> -

where <inputs> are the input FASTQ files to be processed. Note the - at the end, it indicates that Percula should read from standard input stream. As with percula preprocess, the <inputs> argument to fastcat can be a single FASTQ file, or a directory containing FASTQ files.

Outputs

Three outputs are generated by percula preprocess:

  • configs.json: A JSON file containing adapter configurations found within reads.
  • SAMPLE_S1_L001.bam: A BAM file containing the reads that have been processed.
  • SAMPLE_S1_L001_R[1,2]_001.fastq.gz: a pair of pseudo pair-end FASTQ files containing the reads that have been processed. The first file contains the forward reads, and the second file contains the reverse reads.

The first two files are required for downstream processing with wf-single-cell, while the paired-end read files should be provided to Space Ranger for demultiplexing.

Onward Processing

Having processed the data with Percula, the data can be processed with Space Ranger, and subsequently with wf-single-cell.

Space Ranger processing

The short-read FASTQ output files from Percula can be used with Space Ranger as they would be with any other FASTQ files. For example:

spaceranger count \
    --id <SAMPLE_ID> --slide=<SLIDE_ID> --area=<AREA> \
    --create-bam=true \
    --transcriptome=<TRANSCRIPTOME_REFERENCE> \
    --cytaimage=<VISIUM IMAGE> \
    --fastqs=<PERCULA OUTPUT DIRECTORY>

Please note that the --create-bam=true option is required here: it will produce a BAM file containing the sequencing reads, annotated with spatial barcodes and UMI information. This information is required for downstream processing with wf-single-cell.

The required BAM file will be under the spaceranger ouput directory as:

<SPACE_RANGER_OUTPUT>/outs/possorted_genome_bam.bam 

For further help running Space Ranger, please refer to 10X Genomics' documentation.

wf-single-cell processing

The output from Space Ranger can be combined with the output of Percula to run wf-single-cell.

This command is subject to change.

nextflow run wf-single-cell \
    --bam <PERCULA_OUT>/SAMPLE_S1_L001.bam \
    --tags_bam <SPACE_RANGER_OUTPUT>/outs/possorted_genome_bam.bam

The --bam argument should point to the BAM file produced by Percula, while the --tags_bam argument should point to the BAM file produced by Space Ranger. The former is the same option that would be used with the workflow in its standard use with other 10X Genomics data. The latter option is particular to the processing of Visium HD data --- it is used to provide the spatial barcodes and UMI information to the workflow causing the workflow to skip its usual read preprocessing and demultiplexing steps. The workflow will still perform full-length isoform specific processing such as long-read alignment and isoform quantification.

See the wf-single-cell documentation for further information on how to run the workflow, or contact Oxford Nanopore Support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

percula-0.0.1.tar.gz (26.9 kB view details)

Uploaded Source

File details

Details for the file percula-0.0.1.tar.gz.

File metadata

  • Download URL: percula-0.0.1.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for percula-0.0.1.tar.gz
Algorithm Hash digest
SHA256 91992702a18038a1323846747f3bdf75c96bce957a044f283631cc23132470e5
MD5 a4f1f7d649acb265c5248951e71c3fb4
BLAKE2b-256 118ae7f954f96c87a09125e3fc63b96ec3cbbb0425c81e7b3a7fe575abfb3a16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page