Skip to main content

Yada is a Python package for deconvoluting RNA-seq data.

Project description

YADA Deconvolution Package

Yada Flow

Overview

YADA is an innovative biological deconvolution algorithm developed by the author and a collaborator as part of doctoral research in the Systems Biomedicine Lab under the supervision of Professor Efroni. The purpose of YADA is to estimate the proportions of distinct cell types within complex, heterogeneous gene expression samples.

The fundamental premise behind YADA is that the transcriptomic signatures of pure cell populations can be leveraged to deconvolute mixed expression profiles and quantify the relative abundance of each constituent cell type. By analyzing gene expression patterns, deconvolution algorithms computationally unravel the complexities inherent to these cellular mixtures.

YADA implements a robust approach to perform this deconvolution task, accurately estimating immune and other cell type fractions from bulk transcriptomic data. It represents a novel contribution stemming from the author's doctoral studies focused on advancing computational methods for dissecting intricate systems-level biomedical data. Under the guidance of Professor Efroni, the authors were able to design, validate, and optimize YADA to address a crucial need in the field of computational immunology.

Key Features

YADA implements two approaches for deconvolution:

  • Marker-based deconvolution using curated gene signatures representative of each cell type.
  • Reference-based deconvolution utilizing a complete gene expression matrix from pure cell populations.

The key features and innovations of YADA include:

  • High performance: YADA achieves state-of-the-art accuracy on benchmark datasets, as evidenced by top results in a recent challenge.
  • Flexibility: Works with either marker genes or full reference profiles.
  • Broad applicability: Core algorithm supports gene expression data from various sequencing platforms.
  • Computational efficiency: Optimized parallel implementation allows rapid deconvolution of large datasets.
  • User-friendly Python API: One of the few deconvolution libraries natively implemented in Python.

Installation

pip install yada or Clone this repository using the git clone command.

Training Dataset

The training dataset for YADA comprises benchmark datasets available in the "data" folder. This comprehensive collection includes data from publicly available sources as well as synthetically generated datasets created by the authors. These datasets were utilized for training and validation purposes during the DREAM challenge, a community-based deconvolution benchmarking effort.

Input File Requirements

The input data for YADA consists of two files:

  1. pure.csv: A gene expression matrix for purified cell populations, with dimensions (n_genes) x (k_cell_types). For marker-based deconvolution, this file should contain only gene symbols. Refer to the sample notebook for formatting details.
  2. mix.csv: A gene expression matrix for mixed cell samples, with dimensions (n_genes) x (m_mixtures). The first row must contain mixture labels.

Additional guidelines for input files:

  • Gene symbols should be provided in the first column for both files.
  • It is acceptable for some genes to be missing in either the pure or mixed file.
  • Expression data should be in non-log scale. If the maximum expression value is <50, an anti-log transformation is automatically applied.
  • YADA performs internal marker gene selection; therefore, not all provided signature genes may be utilized.

By following these specifications, users can ensure their input data is properly formatted for YADA to perform accurate deconvolution.

Usage

from YADA import *

#Marker gene list.
pure_file_path = '../data/xCell/pure.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix_file_path = '..data/xCell/mix.csv'
result = run_yada(pure, mix)

Sample Notebooks

Contributing

We welcome contributions! Please see our Contributing Guidelines for more information on how to get involved.

License

YADA is available under the MIT license. See the LICENSE file for more details.

Support

For questions, issues, or feature requests, please open an issue on our GitHub repository. For additional support, contact: zurkin at yahoo dot com.

Citation

If you use YADA in your research, please cite our paper: Livne Dani, Snir Tom, Efroni Sol, YADA - Reference Free Deconvolution of RNA Sequencing Data, Current Bioinformatics; Volume 19, Issue , Year 2024, e050624230728. DOI: 10.2174/0115748936304034240405034414, https://www.eurekaselect.com/article/140845

Acknowledgments

We thank the scientific community for their valuable feedback and contributions to this project.

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yada_deconv-0.1.0.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yada_deconv-0.1.0-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file yada_deconv-0.1.0.tar.gz.

File metadata

  • Download URL: yada_deconv-0.1.0.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for yada_deconv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c7da981da2ad4ad10abe3422ca57bf10dc260d163702e477103724228aff6120
MD5 ce595ea2d242bf26c859d7960f9408f0
BLAKE2b-256 351d1e89cfa28d96ee82a85c5db27a5631f3e856abf3448caac811d40db7c209

See more details on using hashes here.

File details

Details for the file yada_deconv-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: yada_deconv-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for yada_deconv-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a856ed7f34886a22693636c21ee8a2bebbab2431aa7d6e1a562a70c7575dbad
MD5 842ac4213196d4e9b67e3cd61d901bb9
BLAKE2b-256 28b54dcbe02c883818910b87ed8d34d9bc578a3af54eb8bb9a4498eeac031365

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page