Skip to main content

Python implementation of deconvolution algorithms

Project description

           ██████╗  ██╗   ██╗  ██████╗  ███████╗  ██████╗  ██████╗  ███╗   ██╗ ██╗   ██╗
           ██╔══██╗ ╚██╗ ██╔╝  ██╔══██╗ ██╔════╝ ██╔════╝ ██╔═══██╗ ████╗  ██║ ██║   ██║
           ██████╔╝  ╚████╔╝   ██║  ██║ █████╗   ██║      ██║   ██║ ██╔██╗ ██║ ██║   ██║
           ██╔═══╝    ╚██╔╝    ██║  ██║ ██╔══╝   ██║      ██║   ██║ ██║╚██╗██║ ╚██╗ ██╔╝
           ██║         ██║     ██████╔╝ ███████╗ ╚██████╗ ╚██████╔╝ ██║ ╚████║  ╚████╔╝
           ╚═╝         ╚═╝     ╚═════╝  ╚══════╝  ╚═════╝  ╚═════╝  ╚═╝  ╚═══╝   ╚═══╝  

Python implementation of bulk RNAseq deconvolution algorithms

How to install

package

pip install pydeconv

dev

uv sync --all-groups --all-extras

How to use

from pydeconv import SignatureMatrix
from pydeconv.model import OLS, NNLS, DWLS, Tape, Scaden, MixupVI, NuSVR, RLR, WNNLS
from adata import AnnData

signature_matrix = SignatureMatrix.load("path/to/signature_matrix.csv") # index: gene names, column: cell types
solver = NNLS(signature_matrix)

adata = AnnData("path/to/adata.h5ad") # index: sample_id, columns: gene_names
adata.layers["raw_counts"] = ... # apply your preprocessing step (check rnaxplorer)

cell_prop = solver.transform(adata, layer="raw_counts", ratio=True)

How to use (full)

1. Load an already registered signature matrix

from pydeconv.signature_matrix.registry import sig_matrix_laughney_lung_cancer
signature_matrix = sig_matrix_laughney_lung_cancer()

[!NOTE] Checkout here for more description of other registered signature matrix.

2. Load a custom signature matrix

from pydeconv import SignatureMatrix
signature_matrix = SignatureMatrix.load("path/to/signature_matrix.csv") #index: gene names, column: cell types

[!NOTE] For the moment only .csv format is supported. You can add any kwargs arguments from pd.read_csv after the path.

3. Predict

from pydeconv.model import Tape, Scaden

adata = AnnData("path/to/adata.h5ad") # index: sample_id, columns: gene_names
adata.layers["counts_sum"] = ...

solver = Scaden(weights_version="cti_2nd_level_granularity")
cell_prop = solver.transform(adata, layer="counts_sum", ratio=True)

[!NOTE] The model will check that you have the corresponding gene names in your input data.

4. Predict (signature based method)

from pydeconv.model import OLS, NNLS, DWLS

signature_matrix = ...
adata = AnnData("path/to/adata.h5ad")
adata.layers["relative_counts"] = ...

solver = DWLS(signature_matrix)
cell_prop = solver.transform(adata, layer="relative_counts", ratio=True)

Benchmark

We benchmarked the performance of several deconvolution algorithms on the CTI dataset, including our developed method MixUpVI. This repository and the proposed methods are part of the following paper: Joint probabilistic modeling of pseudobulk and single-cell transcriptomics enables accurate estimation of cell type composition, published in the Generative AI & Biology workshop of ICML, 2025.

The results are shown below.

To run the benchmark, you can use the following command:

python benchmark/run_benchmark.py

[!NOTE] The repository only provides inference capabilities. It does not provide capabilities to train MixUpVI and other deep learning methods, or create signature matrices. Therefore, we provide the weights from the trained models presented in the publication, and pre-computed signature matrices. To use these models on other datasets, one must provide their own weights and/or pre-computed signature matrices.

Results 1st granularity

benchmark_results_1st

Results 2nd granularity

benchmark_results_1st

[!NOTE] These results are computed and guaranteed using the adata.raw.X layer of the CTI dataset available on cellxgene. It will be automatically downloaded when running the benchmark.

Cite

If you found our work useful in your research, please consider citing it at:

@inproceedings{
grouard2025joint,
title={Joint Probabilistic Modeling of Pseudobulk and Single-Cell Transcriptomics Enables Accurate Estimation of Cell Composition},
author={Simon Grouard and Khalil Ouardini and Yann Rodriguez and Jean-Philippe Vert and Almudena Espin-Perez},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=JhDJ0MGo2z}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydeconv-0.0.7-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file pydeconv-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pydeconv-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.21

File hashes

Hashes for pydeconv-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e775551bcf460b668777f5955763c72e9040fe8aa8a2867df52395e9533ef9cf
MD5 deffdd4668e71d7918068716ee1a4b03
BLAKE2b-256 0406f6ca9ed1c62bf468f994a531385cd631d555ce90290455ed0bcc6bfea3a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page