Skip to main content

Pseudo-Bulking Single-Cell RNA-seq

Project description

adpbulk

Summary

Performs pseudobulking of an AnnData object based on columns available in the .obs dataframe. This was originally intended to be used to pseudo-bulk single-cell RNA-seq data to higher order combinations of the data as to use existing RNA-seq differential expression tools such as edgeR and DESeq2. An example usage of this would be pseudobulking cells based on their cluster, sample of origin, or CRISPRi guide identity. This is intended to work on both individual categories (i.e. one of the examples) or combinations of categories (two of the three, etc.)

Installation

From PyPI

pip install adpbulk

From Github

git clone https://github.com/noamteyssier/adpbulk
cd adpbulk
pip install .
pytest -v 

Usage

This package is intended to be used as a python module.

Single Category Pseudo-Bulk

The simplest use case is to aggregate on a single category. This will aggregate all the observations belonging to the same class within the category and return a pseudo-bulked matrix with dimensions equal to the number of values within the category.

from adpbulk import ADPBulk

# initialize the object
adpb = ADPBulk(adat, "category_name")

# perform the pseudobulking
pseudobulk_matrix = adpb.fit_transform()

# retrieve the sample meta data (useful for easy incorporation with edgeR)
sample_meta = adpb.get_meta()

Multiple Category Pseudo-Bulk

A common use case is to aggregate on multiple categories. This will aggregate all observations beloging to the combination of classes within two categories and return a pseudo-bulked matrix with dimensions equal to the number of values of nonzero intersections between categories.

from adpbulk import ADPBulk

# initialize the object
adpb = ADPBulk(adat, ["category_a", "category_b"])

# perform the pseudobulking
pseudobulk_matrix = adpb.fit_transform()

# retrieve the sample meta data (useful for easy incorporation with edgeR)
sample_meta = adpb.get_meta()

Pseudo-Bulk using raw counts

Some differential expression software expects the counts to be untransformed counts. SCANPY uses the .raw attribute in its AnnData objects to store the initial AnnData object before transformation. If you'd like to perform the pseudo-bulk aggregation using these raw counts you can provide the use_raw=True flag.

from adpbulk import ADPBulk

# initialize the object w. aggregation on the `.raw` attribute
adpb = ADPBulk(adat, ["category_a", "category_b"], use_raw=True)

# perform the pseudobulking
pseudobulk_matrix = adpb.fit_transform()

# retrieve the sample meta data (useful for easy incorporation with edgeR)
sample_meta = adpb.get_meta()

Alternative Aggregation Options

It may also be useful to aggregate using an alternative function besides the sum - this option will allow you to choose between sum, mean, and median as an aggregation function.

from adpbulk import ADPBulk

# initialize the object w. an alternative aggregation option
# aggregation options are: sum, mean, and median
# default aggregation is sum
adpb = ADPBulk(adat, "category", method="mean")

# perform the pseudobulking
pseudobulk_matrix = adpb.fit_transform()

# retrieve the sample meta data (useful for easy incorporation with edgeR)
sample_meta = adpb.get_meta()

Alternative Formatting Options

from adpbulk import ADPBulk

# initialize the object w. alternative name formatting options
adpb = ADPBulk(adat, ["category_a", "category_b"], name_delim=".", group_delim="::")

# perform the pseudobulking
pseudobulk_matrix = adpb.fit_transform()

# retrieve the sample meta data (useful for easy incorporation with edgeR)
sample_meta = adpb.get_meta()

Example AnnData Function

Here is a function to generate an AnnData object to test the module or to play with the object if unfamiliar.

import numpy as np
import pandas as pd
import anndata as ad

def build_adat(SIZE_N=100, SIZE_M=100):
    """
    creates an anndata for testing
    """
    # generates random values (mock transformed data)
	mat = np.random.random((SIZE_N, SIZE_M))

	# generates random values (mock raw count data)
    raw = np.random.randint(0, 1000, (SIZE_N, SIZE_M))

	# creates the observations and categories
    obs = pd.DataFrame({
        "cell": [f"b{idx}" for idx in np.arange(SIZE_N)],
        "cA": np.random.choice(np.random.choice(5)+1, SIZE_N),
        "cB": np.random.choice(np.random.choice(5)+1, SIZE_N),
        "cC": np.random.choice(np.random.choice(5)+1, SIZE_N),
        "cD": np.random.choice(np.random.choice(5)+1, SIZE_N),
        }).set_index("cell")

	# creates the variables (genes) and categories
    var = pd.DataFrame({
        "symbol": [f"g{idx}" for idx in np.arange(SIZE_M)],
        "cA": np.random.choice(np.random.choice(5)+1, SIZE_M),
        "cB": np.random.choice(np.random.choice(5)+1, SIZE_M),
        "cC": np.random.choice(np.random.choice(5)+1, SIZE_M),
        "cD": np.random.choice(np.random.choice(5)+1, SIZE_M),
        }).set_index("symbol")
    
	# Creates the `AnnData` object
	adat = ad.AnnData(
            X=mat,
            obs=obs,
            var=var)
    
	# Creates an `AnnData` object to simulate the `.raw` attribute
	adat_raw = ad.AnnData(
            X=raw,
            obs=obs,
            var=var)
    
	# Sets the `.raw` attribute
	adat.raw = adat_raw
    
	return adat

adat = build_adat()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adpbulk-0.1.4.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

adpbulk-0.1.4-py2.py3-none-any.whl (6.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file adpbulk-0.1.4.tar.gz.

File metadata

  • Download URL: adpbulk-0.1.4.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.29.0

File hashes

Hashes for adpbulk-0.1.4.tar.gz
Algorithm Hash digest
SHA256 56676cb16c9d0fae9cd06a1b4d611586dbf6851b0cd3d0209979f419f8300a37
MD5 e2c117d3fe90477e596f5f15491364c1
BLAKE2b-256 7e3f3b802b366be07e7a088b25607815a008ce57bed9a088b91cdfeed7d7796f

See more details on using hashes here.

File details

Details for the file adpbulk-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: adpbulk-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.29.0

File hashes

Hashes for adpbulk-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 08c3cd2f657733abed97d48e0e09ba67dd2c742671b2c26dc45e7cf6575489dd
MD5 b2b327096a6f642606a3102d6e7a18ab
BLAKE2b-256 af07b5306e080149d3343ae9b920db0b594f9bd419d6ea4baeeeb54e0d23811c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page