Pure-Python MARVEL workflows for plate- and droplet-based single-cell splicing analysis.
Project description
marvel-py
A pure-Python reimplementation of MARVEL for single-cell alternative splicing analysis across plate-based and 10x droplet workflows.
- AnnData-native workflow controller for Scanpy-style pipelines, with MARVEL inputs in
adata.unsand results written back toadata.uns/adata.obsm - No
rpy2; flat TSV / Matrix Market inputs remain supported as the backend compatibility layer - Implements PSI quantification, modality summaries, differential gene / splicing analysis, variable splicing event selection, PCA helpers, isoform switching, and rMATS / cryptic-splice-site utilities
- R MARVEL parity is tracked with committed reference fixtures and full external replay benchmarks
This package is a standalone Python port of the public user-facing MARVEL workflow surface. The R package remains the reference implementation; this repo keeps Python behavior close to R MARVEL through R-vs-Python tests, audit fixtures, and external benchmark reports.
Install
From this repository:
pip install -e .
For benchmark report generation, also install matplotlib, or use the uv run --with ... commands shown below.
The bundled tutorial notebooks are expected to run in the ove micromamba environment and import the package normally:
import marvel_py as mp
They do not modify sys.path.
Quick Start
AnnData-native plate workflow
import marvel_py as mp
adata = mp.setup_plate_anndata(
splice_pheno="splice_pheno.tsv",
splice_junction="splice_junction.tsv",
intron_counts="intron_counts.tsv",
gene_feature="gene_feature.tsv",
exp="exp.tsv",
gtf="annotation.gtf",
splice_feature={
"SE": "SE_feature.tsv",
"MXE": "MXE_feature.tsv",
"RI": "RI_feature.tsv",
"A5SS": "A5SS_feature.tsv",
"A3SS": "A3SS_feature.tsv",
},
)
adata = mp.check_alignment(adata, level="SJ")
adata = mp.compute_psi(adata, event_type="SE", coverage_threshold=10.0)
# Results stay with the AnnData object.
adata.uns["marvel"]["tables"]["psi"]["SE"]
adata.obsm["X_marvel_psi_se"]
pass_ids = adata.obs.loc[adata.obs["qc.seq"] == "pass", "sample.id"]
adata = mp.subset_samples(adata, sample_ids=pass_ids.tolist())
adata = mp.transform_exp_values(adata, offset=1.0, transformation="log2", threshold_lower=1.0)
g1 = adata.obs.loc[adata.obs["cell.type"] == "iPSC", "sample.id"].tolist()
g2 = adata.obs.loc[adata.obs["cell.type"] == "Endoderm", "sample.id"].tolist()
adata = mp.compare_values(adata, cell_group_g1=g1, cell_group_g2=g2, level="gene", method="wilcox")
The old function names are AnnData-aware. Passing a MarvelPlate / Marvel10x object preserves the legacy behavior; passing an AnnData object updates that AnnData in place and returns it.
AnnData-native 10x droplet workflow
import marvel_py as mp
adata = mp.setup_10x_anndata(
gene_norm_matrix="gene_norm_matrix.mtx",
gene_norm_pheno="gene_norm_pheno.tsv",
gene_norm_feature="gene_norm_feature.tsv",
gene_count_matrix="gene_count_matrix.mtx",
gene_count_pheno="gene_count_pheno.tsv",
gene_count_feature="gene_count_feature.tsv",
sj_count_matrix="sj_count_matrix.mtx",
sj_count_pheno="sj_count_pheno.tsv",
sj_count_feature="sj_count_feature.tsv",
pca="pca.tsv",
gtf="annotation.gtf",
)
# Large 10x Matrix Market inputs are lazy by default. Pass
# load_matrices=True when you want matrices loaded into adata.X/layers
# during setup instead of during MARVEL processing.
adata = mp.annotate_genes_10x(adata)
adata = mp.annotate_sj_10x(adata)
adata = mp.validate_sj_10x(adata)
adata = mp.filter_genes_10x(adata)
adata = mp.check_alignment_10x(adata)
g1 = adata.obs.loc[adata.obs["cell.type"] == "iPSC", "sample.id"].tolist()
g2 = adata.obs.loc[adata.obs["cell.type"] == "Cardio day 10", "sample.id"].tolist()
adata = mp.plot_pct_expr_cells_genes_10x(adata, cell_group_g1=g1, cell_group_g2=g2)
adata = mp.plot_pct_expr_cells_sj_10x(adata, cell_group_g1=g1, cell_group_g2=g2)
adata = mp.compare_values_sj_10x(adata, cell_group_g1=g1, cell_group_g2=g2, n_iterations=10)
adata = mp.compare_values_genes_10x(adata)
Tabular outputs are mirrored under adata.uns["marvel"]["tables"], and plate PSI matrices are exposed in adata.obsm. A runtime backend object is cached internally so consecutive function calls on the same AnnData keep MARVEL state without placing a non-serializable object in adata.uns.
Both AnnData function calls and low-level object helpers (MarvelPlate, Marvel10x) are supported. The preferred public API is import marvel_py as mp followed by AnnData setup plus the existing MARVEL function names; existing flat-file functions such as mp.create_marvel_object(...) and mp.create_marvel_object_10x(...) remain available for benchmarks and legacy scripts. The old nested api facade and legacy py_marvel mirror are not the supported interface.
Workflow Coverage
| Area | Plate | 10x droplet |
|---|---|---|
| Object creation and alignment | yes | yes |
| Gene and splice-junction annotation | via input feature tables / GTF helpers | yes |
| PSI / splice-junction expression summaries | yes | yes |
| Differential gene and splicing analysis | yes | yes |
| Variable splicing event selection | yes | no |
| PCA / plotting helper tables | yes | yes |
| Isoform switching helpers | yes | yes |
| rMATS and cryptic splice-site utilities | yes | no |
The implementation targets the public MARVEL/man workflow surface rather than private R internals.
Benchmarks
The external benchmark replays R MARVEL and marvel_py on the full external tutorial datasets:
external_plate_data:iPSCvsEndoderm,qc.seq == pass, withSE/MXE/RI/A5SS/A3SSinputs availableexternal_droplet_data:iPSCvsCardio day 10, using 10 permutations for tractable full-data replay- Plate R replay rebuilds from flat files and recomputes PSI. Droplet R replay uses the bundled R MARVEL object and recomputes downstream summaries / DE; Python replay loads the flat Matrix Market / TSV inputs.
Latest validated run:
| Run | Artifacts | Numeric metrics | Nonzero row deltas | Max absolute difference | Minimum Pearson r |
|---|---|---|---|---|---|
benchmark/runs/20260525T070728Z |
10 | 323 | 0 | 8.91e-4 |
0.9999961188133987 |
Runtime from the latest run:
| Section | R | Python | Speedup |
|---|---|---|---|
| Plate full replay | 1116.16 s |
345.39 s |
3.23x |
| Plate variable splicing only | 2.71 s |
0.86 s |
3.15x |
| Droplet replay | 1412.29 s |
477.30 s |
2.96x |
Artifact-level agreement:
| Section | Artifact | Rows R | Rows Python | Max abs diff | Min Pearson r |
|---|---|---|---|---|---|
| Plate | psi_se |
20509 | 20509 | 6.66e-16 |
0.9999999999999976 |
| Plate | psi_ri |
8295 | 8295 | 6.66e-16 |
0.9999999999999988 |
| Plate | de_gene |
21059 | 21059 | 5.15e-14 |
0.9999999999999998 |
| Plate | count_events_iPSC |
5 | 5 | 0 |
1.0 |
| Plate | count_events_endoderm |
5 | 5 | 0 |
0.9999999999999999 |
| Plate | variable_splicing |
13318 | 13318 | 8.91e-4 |
0.9999961188133987 |
| Droplet | pct_expr_gene |
20187 | 20187 | 0 |
1.0 |
| Droplet | pct_expr_sj |
1861 | 1861 | 0 |
0.9999999999999998 |
| Droplet | de_sj |
2937 | 2937 | 8.88e-15 |
0.9999999999999999 |
| Droplet | de_gene |
831 | 831 | 7.11e-15 |
0.9999999999999999 |
The retained summary figures are in benchmark/results/figures:
benchmark_summary_figure.pngartifact_row_delta.pngtop20_metric_max_abs_diff.pnglowest20_metric_pearson_r.pngplate_metric_heatmap.pngdroplet_metric_heatmap.pngruntime_r_vs_python.png- per-artifact scatter plots such as
plate__psi_se__scatter.png,plate__variable_splicing__scatter.png, anddroplet__de_sj__scatter.png
To rerun the external benchmark and archive the results:
# Optional: only needed when Rscript is not already on PATH.
export MARVEL_RSCRIPT=/path/to/Rscript
uv run --with matplotlib \
python benchmark/scripts/run_external_benchmark_archive.py
uv run --with matplotlib \
python benchmark/scripts/plot_benchmark_summary_figure.py \
--run-dir benchmark/runs/<run_id>
R benchmarks require an R environment with MARVEL and its dependencies installed. MARVEL_RSCRIPT
can point to any compatible R installation; it is not tied to a local micromamba path.
The archive contains copied results, code snapshots, command logs, metric_summary.tsv, artifact_summary.tsv, and report figures.
Examples
| Notebook | What it covers |
|---|---|
examples/plate_data.ipynb |
Plate-based MARVEL workflow using import marvel_py as mp |
examples/Droplet_data.ipynb |
10x droplet MARVEL workflow using import marvel_py as mp |
Scripted demos are also available:
| Script | Purpose |
|---|---|
scripts/run_plate_ref_python.py |
Run the core plate tutorial workflow from exported flat files |
scripts/run_ref1_python.py |
Run the core droplet ref1 workflow from Matrix Market / TSV inputs |
scripts/export_plate_demo_inputs.R |
Export R MARVEL plate demo data into Python-friendly inputs |
scripts/export_marvel_demo_inputs.R |
Export R MARVEL droplet demo data into Python-friendly inputs |
Relationship to R MARVEL
R MARVEL is the canonical package:
- Reference package:
MARVEL - Python package: this repo, importing as
marvel_py - Benchmark code:
benchmark/scripts
If you need exact R package behavior for unsupported private internals, use R MARVEL directly. If you need the implemented public workflows from Python, use marvel_py.
Relationship to omicverse
Developed following the omicverse-to-developer py- conventions (pure-Python, no rpy2 in production code, AnnData-native I/O, Numba only on hot kernels). Upstream integration plan:
Canonical implementation: omicverse.external.copykat_py (pending) Standalone mirror (this repo): same code, same API, without the full omicverse packaging
Citation
If you use this package, please cite the original MARVEL paper:
Wei Xiong Wen, Adam J Mead, Supat Thongjuea, MARVEL: an integrated alternative splicing analysis platform for single-cell RNA sequencing data, Nucleic Acids Research, Volume 51, Issue 5, 21 March 2023, Page e29, https://doi.org/10.1093/nar/gkac1260
and acknowledge omicverse / this repo for the Python port and optimisations.
License
GNU GPLv3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marvel_python-0.1.0.tar.gz.
File metadata
- Download URL: marvel_python-0.1.0.tar.gz
- Upload date:
- Size: 106.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2836e3ba1eb5f839d048920e724d77da96c9378933f7f834c1689190d6ac1b74
|
|
| MD5 |
1eda34f560488f380f4b053364e4ab73
|
|
| BLAKE2b-256 |
f5b88e73008b6cc0dfbfe08cc9bd7b984b98c3fc4de60bb43726df84de625ebf
|
Provenance
The following attestation bundles were made for marvel_python-0.1.0.tar.gz:
Publisher:
workflow.yml on omicverse/py-MARVEL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marvel_python-0.1.0.tar.gz -
Subject digest:
2836e3ba1eb5f839d048920e724d77da96c9378933f7f834c1689190d6ac1b74 - Sigstore transparency entry: 1633990810
- Sigstore integration time:
-
Permalink:
omicverse/py-MARVEL@adaf8da0453f0bbefd8e5d64fcea87b4073e181d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@adaf8da0453f0bbefd8e5d64fcea87b4073e181d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file marvel_python-0.1.0-py3-none-any.whl.
File metadata
- Download URL: marvel_python-0.1.0-py3-none-any.whl
- Upload date:
- Size: 97.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27b3e5935bf43646ee399fe99e1b9a3e2e4e000b0d5df3a080c27f8859624ad4
|
|
| MD5 |
f38fbc74d4d27a20493c8b9e1713df13
|
|
| BLAKE2b-256 |
28e1627951d30e6130e66da76e2359d679fd3f3cd9fbcd9e8f3616b48275ef41
|
Provenance
The following attestation bundles were made for marvel_python-0.1.0-py3-none-any.whl:
Publisher:
workflow.yml on omicverse/py-MARVEL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marvel_python-0.1.0-py3-none-any.whl -
Subject digest:
27b3e5935bf43646ee399fe99e1b9a3e2e4e000b0d5df3a080c27f8859624ad4 - Sigstore transparency entry: 1633990830
- Sigstore integration time:
-
Permalink:
omicverse/py-MARVEL@adaf8da0453f0bbefd8e5d64fcea87b4073e181d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@adaf8da0453f0bbefd8e5d64fcea87b4073e181d -
Trigger Event:
workflow_dispatch
-
Statement type: