Skip to main content

A cross-platform toolkit for building RNA velocity-ready spliced/unspliced matrices

Project description

velocity-kit

PyPI version [![Python 3.8+](htt- --genes-col: Column index in features.tsv to use as gene ID (default: 1 for gene symbols)

  • -v, --verbose: Increase verbosity level (use -v for info, -vv for debug)

💡 Tip: For comprehensive velocity analysis with QC plots, use the run-scvelo command. See scVelo Analysis Report.

Example

# Method 1: Point to the count directories directly
velocity-kit prep-tenx \
  --total cellranger_introns/outs/raw_feature_bc_matrix \
  --exonic cellranger_standard/outs/raw_feature_bc_matrix \
  --out-loom velocity.loom \
  -v

# Generate analysis report
velocity-kit run-scvelo velocity.loom -o reports/sample1io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

## Overview

Standard RNA velocity methods expect **spliced** and **unspliced** counts, but many modern single-cell platforms don't directly output these layers. `velocity-kit` provides platform-specific tools to generate velocity-compatible matrices using the **dual-run subtraction method**.

### Supported Platforms

-  **Fluent BioSciences (PIPseq)** - via PIPseeker
-  **10x Genomics** - via CellRanger with `--include-introns`
- 🚧 **Parse Biosciences** - Coming soon  

## Installation

### From PyPI (recommended)

```bash
pip install velocitykit

From source

git clone https://github.com/yourusername/velocitykit.git
cd velocitykit
pip install -e .

Optional dependencies

To run scVelo preprocessing:

pip install velocitykit[scvelo]

For development:

pip install velocity-kit[dev]

Quick Start

PIPseq (PIPseeker)

# Step 1: Generate velocity-compatible matrices
velocity-kit prep-pipseq \
  --total /path/to/pipseeker_total_run \
  --exonic /path/to/pipseeker_exons_only_run \
  --out-loom output.loom

# Step 2: Generate analysis report (optional)
velocity-kit run-scvelo output.loom -o reports/sample1

10x Genomics (CellRanger)

# Step 1: Generate velocity-compatible matrices
velocity-kit prep-tenx \
  --total /path/to/cellranger_with_introns/raw_feature_bc_matrix \
  --exonic /path/to/cellranger_standard/raw_feature_bc_matrix \
  --out-loom output.loom

# Step 2: Generate analysis report (optional)
velocity-kit run-scvelo output.loom -o reports/sample1

Note: You can specify just --out-h5ad or just --out-loom if you only need one format.

Usage

Command Structure

velocity-kit <platform-command> [options]

Available platform commands:

  • prep-pipseq - Prepare velocity matrices from PIPseeker outputs
  • prep-tenx - Prepare velocity matrices from 10x Genomics CellRanger outputs
  • prep-parse - Prepare velocity matrices from Parse Biosciences outputs (coming soon)
  • prep-scalebio - Prepare velocity matrices from ScaleBio outputs (coming soon)
  • run-scvelo - Run scVelo analysis and generate comprehensive report from loom file

PIPseq Detailed Usage

Required Arguments

  • --total: Directory with PIPseeker run that includes introns (total counts)
  • --exonic: Directory with PIPseeker --exons-only run using the RAW/UNFILTERED count matrix
  • At least one of:
    • --out-h5ad: Output .h5ad file path
    • --out-loom: Output .loom file path

Optional Arguments

  • --genes-col: Column index in features.tsv to use as gene ID (default: 0)
  • -v, --verbose: Increase verbosity level (use -v for info, -vv for debug)

Example

# Generate velocity-compatible matrices
velocity-kit prep-pipseq \
  --total Analysis/total_run \
  --exonic Analysis/exonic_raw_run \
  --out-loom velocity.loom \
  -v

# Generate comprehensive analysis report
velocity-kit run-scvelo velocity.loom \
  -o reports/sample1 \
  -n Sample1

10x Genomics Detailed Usage

Required Arguments

  • --total: Directory with CellRanger run using --include-introns flag (or path to raw_feature_bc_matrix)
  • --exonic: Directory with standard CellRanger run (exons only). Use RAW/UNFILTERED raw_feature_bc_matrix, NOT filtered_feature_bc_matrix
  • At least one of:
    • --out-h5ad: Output .h5ad file path
    • --out-loom: Output .loom file path

Optional Arguments

  • --genes-col: Column index in features.tsv to use as gene ID (default: 1 for gene symbols)
  • -v, --verbose: Increase verbosity level (use -v for info, -vv for debug)

Example

# Method 1: Point to the count directories directly
velocity-kit prep-tenx \
  --total cellranger_introns/outs/raw_feature_bc_matrix \
  --exonic cellranger_standard/outs/raw_feature_bc_matrix \
  --out-loom velocity.loom \
  -v

# Generate analysis report
velocity-kit analyze velocity.loom -o reports/sample1

# Method 2: Point to the parent directories (will auto-find raw_feature_bc_matrix)
velocity-kit prep-tenx \
  --total cellranger_introns/outs \
  --exonic cellranger_standard/outs \
  --out-loom velocity.loom

How to Generate the Required CellRanger Runs

  1. Standard run (exonic only):

    cellranger count --id=sample_exonic \
      --transcriptome=/path/to/refdata \
      --fastqs=/path/to/fastqs \
      --sample=MySample
    
  2. Run with introns:

    cellranger count --id=sample_with_introns \
      --transcriptome=/path/to/refdata \
      --fastqs=/path/to/fastqs \
      --sample=MySample \
      --include-introns
    

scVelo Analysis Report

Generate a comprehensive HTML report with QC plots, velocity analysis, and visualizations from a loom file.

Required Arguments

  • loom_path: Path to input .loom file (generated by prep-* commands)

Optional Arguments

  • -o, --output-dir: Output directory for plots and HTML report (default: scvelo_analysis)
  • -n, --sample-name: Sample name for report title (default: derived from loom filename)
  • -v, --verbose: Increase verbosity level (use -v for info, -vv for debug)

Requirements

This command requires scvelo and scanpy to be installed:

pip install scvelo scanpy
# or
pip install velocity-kit[scvelo]

Example

# Generate analysis report from loom file
velocity-kit run-scvelo velocity.loom \
  -o reports/sample1 \
  -n Sample1 \
  -v

# Use default output directory and auto-detect sample name
velocity-kit run-scvelo velocity.loom

Output

The report includes:

  • QC plots: Total counts, gene counts, spliced/unspliced proportions
  • Velocity embeddings: UMAP with velocity arrows and stream plots
  • Top velocity genes: Ranked genes driving velocity patterns
  • HTML report: All plots combined in an interactive HTML file

Python API

from velocitykit import load_10x_mtx, align_and_union, build_velocity_adata
from pathlib import Path

# Load matrices
X_total, bc_total, g_total = load_10x_mtx(
    Path("total_run/matrix.mtx.gz"),
    Path("total_run/barcodes.tsv.gz"),
    Path("total_run/features.tsv.gz")
)

X_exon, bc_exon, g_exon = load_10x_mtx(
    Path("exonic_run/matrix.mtx.gz"),
    Path("exonic_run/barcodes.tsv.gz"),
    Path("exonic_run/features.tsv.gz")
)

# Align to union of genes and barcodes
X_total_u, X_exon_u, genes_u, bc_u = align_and_union(
    X_total, bc_total, g_total,
    X_exon, bc_exon, g_exon
)

# Build velocity-compatible AnnData
adata = build_velocity_adata(X_total_u, X_exon_u, genes_u, bc_u)

# Save
adata.write_h5ad("output.h5ad")
adata.write_loom("output.loom")

Why Dual-Run Subtraction?

For platforms that use complex molecular counting (MI correction, deduplication, multi-mapping resolution), BAM-based velocity methods can be invalid because these counting transformations don't survive in the BAM file.

The dual-run subtraction approach:

  1. Run your pipeline normally → counts include exonic + intronic molecules
  2. Run with exons-only mode on the raw/unfiltered matrix → spliced-only molecules
  3. Compute: unspliced = total - spliced

This preserves the platform's counting model and produces valid velocity layers.

When to Use Dual-Run Subtraction

  • PIPseq: Always use dual-run (BAM-based methods are incorrect)
  • 10x Genomics: Recommended for consistency, especially with CellRanger ≥7.0
  • ⚠️ Other platforms: Evaluate whether platform-specific counting differs from simple read counting

Important Notes

⚠️ For PIPseq: The --exonic directory must point to the RAW/UNFILTERED exons-only run.

Do NOT use a filtered exonic matrix, because the called-cell set may not match the total matrix. This will cause barcode mismatches and incorrect velocity estimates.

Requirements

  • Python ≥ 3.8
  • anndata ≥ 0.8.0
  • h5py ≥ 3.8.0
  • loompy ≥ 3.0.6
  • numpy ≥ 1.21.0 (< 2.0.0 to avoid breaking changes)
  • pandas ≥ 1.3.0
  • scipy ≥ 1.7.0
  • tqdm ≥ 4.60.0

Optional:

  • scvelo ≥ 0.2.4 (for preprocessing)

Note: Python 3.7 support was dropped in v0.2.0. For older Python versions, use velocity-kit v0.1.x.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Citation

If you use this tool in your research, please cite:

[Add citation information here]

Contact

For questions or issues, please email ccrsfifx@nih.gov or open an issue on GitHub.

Changelog

v0.1.0 (Initial Release)

  • PIPseq/PIPseeker support
  • Modular platform architecture
  • Python API for custom workflows

Contact

For questions or issues, please:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

velocity_kit-0.2.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

velocity_kit-0.2.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file velocity_kit-0.2.0.tar.gz.

File metadata

  • Download URL: velocity_kit-0.2.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for velocity_kit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c819736eef94788237bc81c9d20ec18a2337607354f46c1f95d2a722d6147ae8
MD5 2c3c65154f80a8ac34085a88c003a0f8
BLAKE2b-256 9d4d83187cfe5f1d370b0f9907939c3334a9611166f5f91d4318c9f48827f2f5

See more details on using hashes here.

File details

Details for the file velocity_kit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: velocity_kit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for velocity_kit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8142030fb73dc865655ebc2009e15d6d0ec857b5acdf7930c1900664e8d5e908
MD5 8f68704592fe1bcb9790f1df41d38e66
BLAKE2b-256 7ed611e328aaf80beadc1a3d00b5e340662b5dc9803a8d63f943cdeb7bf2b057

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page