A cross-platform toolkit for building RNA velocity-ready spliced/unspliced matrices
Project description
velocity-kit
[
-v, --verbose: Increase verbosity level (use-vfor info,-vvfor debug)
💡 Tip: For comprehensive velocity analysis with QC plots, use the
run-scvelocommand. See scVelo Analysis Report.
Example
# Method 1: Point to the count directories directly
velocity-kit prep-tenx \
--total cellranger_introns/outs/raw_feature_bc_matrix \
--exonic cellranger_standard/outs/raw_feature_bc_matrix \
--out-loom velocity.loom \
-v
# Generate analysis report
velocity-kit run-scvelo velocity.loom -o reports/sample1io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
## Overview
Standard RNA velocity methods expect **spliced** and **unspliced** counts, but many modern single-cell platforms don't directly output these layers. `velocity-kit` provides platform-specific tools to generate velocity-compatible matrices using the **dual-run subtraction method**.
### Supported Platforms
- ✅ **Fluent BioSciences (PIPseq)** - via PIPseeker
- ✅ **10x Genomics** - via CellRanger with `--include-introns`
- 🚧 **Parse Biosciences** - Coming soon
## Installation
### From PyPI (recommended)
```bash
pip install velocitykit
From source
git clone https://github.com/yourusername/velocitykit.git
cd velocitykit
pip install -e .
Optional dependencies
To run scVelo preprocessing:
pip install velocitykit[scvelo]
For development:
pip install velocity-kit[dev]
Quick Start
PIPseq (PIPseeker)
# Step 1: Generate velocity-compatible matrices
velocity-kit prep-pipseq \
--total /path/to/pipseeker_total_run \
--exonic /path/to/pipseeker_exons_only_run \
--out-loom output.loom
# Step 2: Generate analysis report (optional)
velocity-kit run-scvelo output.loom -o reports/sample1
10x Genomics (CellRanger)
# Step 1: Generate velocity-compatible matrices
velocity-kit prep-tenx \
--total /path/to/cellranger_with_introns/raw_feature_bc_matrix \
--exonic /path/to/cellranger_standard/raw_feature_bc_matrix \
--out-loom output.loom
# Step 2: Generate analysis report (optional)
velocity-kit run-scvelo output.loom -o reports/sample1
Note: You can specify just --out-h5ad or just --out-loom if you only need one format.
Usage
Command Structure
velocity-kit <platform-command> [options]
Available platform commands:
prep-pipseq- Prepare velocity matrices from PIPseeker outputsprep-tenx- Prepare velocity matrices from 10x Genomics CellRanger outputsprep-parse- Prepare velocity matrices from Parse Biosciences outputs (coming soon)prep-scalebio- Prepare velocity matrices from ScaleBio outputs (coming soon)run-scvelo- Run scVelo analysis and generate comprehensive report from loom file
PIPseq Detailed Usage
Required Arguments
--total: Directory with PIPseeker run that includes introns (total counts)--exonic: Directory with PIPseeker--exons-onlyrun using the RAW/UNFILTERED count matrix- At least one of:
--out-h5ad: Output.h5adfile path--out-loom: Output.loomfile path
Optional Arguments
--genes-col: Column index infeatures.tsvto use as gene ID (default: 0)-v, --verbose: Increase verbosity level (use-vfor info,-vvfor debug)
Example
# Generate velocity-compatible matrices
velocity-kit prep-pipseq \
--total Analysis/total_run \
--exonic Analysis/exonic_raw_run \
--out-loom velocity.loom \
-v
# Generate comprehensive analysis report
velocity-kit run-scvelo velocity.loom \
-o reports/sample1 \
-n Sample1
10x Genomics Detailed Usage
Required Arguments
--total: Directory with CellRanger run using--include-intronsflag (or path toraw_feature_bc_matrix)--exonic: Directory with standard CellRanger run (exons only). Use RAW/UNFILTEREDraw_feature_bc_matrix, NOTfiltered_feature_bc_matrix- At least one of:
--out-h5ad: Output.h5adfile path--out-loom: Output.loomfile path
Optional Arguments
--genes-col: Column index infeatures.tsvto use as gene ID (default: 1 for gene symbols)-v, --verbose: Increase verbosity level (use-vfor info,-vvfor debug)
Example
# Method 1: Point to the count directories directly
velocity-kit prep-tenx \
--total cellranger_introns/outs/raw_feature_bc_matrix \
--exonic cellranger_standard/outs/raw_feature_bc_matrix \
--out-loom velocity.loom \
-v
# Generate analysis report
velocity-kit analyze velocity.loom -o reports/sample1
# Method 2: Point to the parent directories (will auto-find raw_feature_bc_matrix)
velocity-kit prep-tenx \
--total cellranger_introns/outs \
--exonic cellranger_standard/outs \
--out-loom velocity.loom
How to Generate the Required CellRanger Runs
-
Standard run (exonic only):
cellranger count --id=sample_exonic \ --transcriptome=/path/to/refdata \ --fastqs=/path/to/fastqs \ --sample=MySample
-
Run with introns:
cellranger count --id=sample_with_introns \ --transcriptome=/path/to/refdata \ --fastqs=/path/to/fastqs \ --sample=MySample \ --include-introns
scVelo Analysis Report
Generate a comprehensive HTML report with QC plots, velocity analysis, and visualizations from a loom file.
Required Arguments
loom_path: Path to input.loomfile (generated byprep-*commands)
Optional Arguments
-o, --output-dir: Output directory for plots and HTML report (default:scvelo_analysis)-n, --sample-name: Sample name for report title (default: derived from loom filename)-v, --verbose: Increase verbosity level (use-vfor info,-vvfor debug)
Requirements
This command requires scvelo and scanpy to be installed:
pip install scvelo scanpy
# or
pip install velocity-kit[scvelo]
Example
# Generate analysis report from loom file
velocity-kit run-scvelo velocity.loom \
-o reports/sample1 \
-n Sample1 \
-v
# Use default output directory and auto-detect sample name
velocity-kit run-scvelo velocity.loom
Output
The report includes:
- QC plots: Total counts, gene counts, spliced/unspliced proportions
- Velocity embeddings: UMAP with velocity arrows and stream plots
- Top velocity genes: Ranked genes driving velocity patterns
- HTML report: All plots combined in an interactive HTML file
Python API
from velocitykit import load_10x_mtx, align_and_union, build_velocity_adata
from pathlib import Path
# Load matrices
X_total, bc_total, g_total = load_10x_mtx(
Path("total_run/matrix.mtx.gz"),
Path("total_run/barcodes.tsv.gz"),
Path("total_run/features.tsv.gz")
)
X_exon, bc_exon, g_exon = load_10x_mtx(
Path("exonic_run/matrix.mtx.gz"),
Path("exonic_run/barcodes.tsv.gz"),
Path("exonic_run/features.tsv.gz")
)
# Align to union of genes and barcodes
X_total_u, X_exon_u, genes_u, bc_u = align_and_union(
X_total, bc_total, g_total,
X_exon, bc_exon, g_exon
)
# Build velocity-compatible AnnData
adata = build_velocity_adata(X_total_u, X_exon_u, genes_u, bc_u)
# Save
adata.write_h5ad("output.h5ad")
adata.write_loom("output.loom")
Why Dual-Run Subtraction?
For platforms that use complex molecular counting (MI correction, deduplication, multi-mapping resolution), BAM-based velocity methods can be invalid because these counting transformations don't survive in the BAM file.
The dual-run subtraction approach:
- Run your pipeline normally → counts include exonic + intronic molecules
- Run with exons-only mode on the raw/unfiltered matrix → spliced-only molecules
- Compute: unspliced = total - spliced
This preserves the platform's counting model and produces valid velocity layers.
When to Use Dual-Run Subtraction
- ✅ PIPseq: Always use dual-run (BAM-based methods are incorrect)
- ✅ 10x Genomics: Recommended for consistency, especially with CellRanger ≥7.0
- ⚠️ Other platforms: Evaluate whether platform-specific counting differs from simple read counting
Important Notes
⚠️ For PIPseq: The --exonic directory must point to the RAW/UNFILTERED exons-only run.
Do NOT use a filtered exonic matrix, because the called-cell set may not match the total matrix. This will cause barcode mismatches and incorrect velocity estimates.
Requirements
- Python ≥ 3.8
- anndata ≥ 0.8.0
- h5py ≥ 3.8.0
- loompy ≥ 3.0.6
- numpy ≥ 1.21.0 (< 2.0.0 to avoid breaking changes)
- pandas ≥ 1.3.0
- scipy ≥ 1.7.0
- tqdm ≥ 4.60.0
Optional:
- scvelo ≥ 0.2.4 (for preprocessing)
Note: Python 3.7 support was dropped in v0.2.0. For older Python versions, use velocity-kit v0.1.x.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
Citation
If you use this tool in your research, please cite:
[Add citation information here]
Contact
For questions or issues, please email ccrsfifx@nih.gov or open an issue on GitHub.
Changelog
v0.1.0 (Initial Release)
- PIPseq/PIPseeker support
- Modular platform architecture
- Python API for custom workflows
Contact
For questions or issues, please:
- Email: ccrsfifx@nih.gov
- Open an issue on GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file velocity_kit-0.2.0.tar.gz.
File metadata
- Download URL: velocity_kit-0.2.0.tar.gz
- Upload date:
- Size: 38.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c819736eef94788237bc81c9d20ec18a2337607354f46c1f95d2a722d6147ae8
|
|
| MD5 |
2c3c65154f80a8ac34085a88c003a0f8
|
|
| BLAKE2b-256 |
9d4d83187cfe5f1d370b0f9907939c3334a9611166f5f91d4318c9f48827f2f5
|
File details
Details for the file velocity_kit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: velocity_kit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8142030fb73dc865655ebc2009e15d6d0ec857b5acdf7930c1900664e8d5e908
|
|
| MD5 |
8f68704592fe1bcb9790f1df41d38e66
|
|
| BLAKE2b-256 |
7ed611e328aaf80beadc1a3d00b5e340662b5dc9803a8d63f943cdeb7bf2b057
|