Skip to main content

Differential Representation with Hypergeometric Tests

Project description

HGSig

This tool is used to measure the differential clustered representation of grouped objects. The original motivation was in CRISPRi single-cell sequencing data and measuring the differential representation of individual knockdowns in each of the leiden clusters. This was used to guide whether a knockdown had a significant representation from the non-targeting controls and so provide a hint at the potential function of that knockdown. This tool is a means of generalizing the code to any sort of clusters and groups with references and provide an API for testing different differential representation strategies in a reproducible way.

Installation

pip

pip install hgsig

github

git clone https://github.com/noamteyssier/hgsig
cd hgsig
pip install .
pytest -v

Usage: Differential Representation Testing

This tool is intended to be used as a python module.

Multiple References

import numpy as np
from hgsig import HGSig

# Number of observations
size = 10000

# Number of Groups
n_groups = 50

# Number of Clusters
n_clusters = 8

# randomly assign clusters
clusters = np.array([
    f"c{i}" for i in np.random.choice(n_clusters, size=size)
])

# randomly assign groups
groups = np.array([
    f"g{i}" for i in np.random.choice(n_groups, size=size)
])

# initialize object
hgs = HGSig(
    clusters,
    groups,
    reference=["g0", "g3"]
)

# run testing
hgs.fit()
pval = hgs.get_pval()
pcc = hgs.get_pcc()

Fisher's Exact Test

import numpy as np
from hgsig import HGSig

# Number of observations
size = 10000

# Number of Groups
n_groups = 50

# Number of Clusters
n_clusters = 8

# randomly assign clusters
clusters = np.array([
    f"c{i}" for i in np.random.choice(n_clusters, size=size)
])

# randomly assign groups
groups = np.array([
    f"g{i}" for i in np.random.choice(n_groups, size=size)
])

# initialize object
hgs = HGSig(
    clusters,
    groups,
    reference=["g0", "g3"],
    method="fishers"
)

# run testing
hgs.fit()
pval = hgs.get_pval()
pcc = hgs.get_pcc()

Single Reference Group

It is highly recommended here to use a fisher's exact test because the hypergeometric testing conditions will generally not be satisfied using only a single group. This is because if the groups are of equal sizes it is likely you will have more than the original number of observations in the reference group and thus fail the prerequirements for the hypergeometric test. This condition is not required for a fisher's exact test and so it should be used in this case.

import numpy as np
from hgsig import HGSig

# Number of observations
size = 10000

# Number of Groups
n_groups = 50

# Number of Clusters
n_clusters = 8

# randomly assign clusters
clusters = np.array([
    f"c{i}" for i in np.random.choice(n_clusters, size=size)
])

# randomly assign groups
groups = np.array([
    f"g{i}" for i in np.random.choice(n_groups, size=size)
])

# initialize object
hgs = HGSig(
    clusters,
    groups,
    reference="g0",
    method="fishers"
)

# run testing
hgs.fit()
pval = hgs.get_pval()
pcc = hgs.get_pcc()

Multiple Groups with an Alternative Aggregation Function

The default aggregation function for the references is to sum the values across each of the conditions, but it is also possible to use alternative aggregation strategies if it is of interest.

import numpy as np
from hgsig import HGSig

# Number of observations
size = 10000

# Number of Groups
n_groups = 50

# Number of Clusters
n_clusters = 8

# randomly assign clusters
clusters = np.array([
    f"c{i}" for i in np.random.choice(n_clusters, size=size)
])

# randomly assign groups
groups = np.array([
    f"g{i}" for i in np.random.choice(n_groups, size=size)
])

# initialize object
hgs = HGSig(
    clusters,
    groups,
    reference=["g0", "g1", "g2"],
    method="fishers",
    agg="mean"
)

# run testing
hgs.fit()
pval = hgs.get_pval()
pcc = hgs.get_pcc()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hgsig-0.1.6.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hgsig-0.1.6-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file hgsig-0.1.6.tar.gz.

File metadata

  • Download URL: hgsig-0.1.6.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for hgsig-0.1.6.tar.gz
Algorithm Hash digest
SHA256 8053c6552584f9bad76e61feb35cb785dbb51c5c31889f7808c54b26b933569d
MD5 4b5da996cf037fc43795cf199b980b7c
BLAKE2b-256 78ef684d49e2cd176feb32a1516ee6123e8b73469244748b6381f4f470b574c1

See more details on using hashes here.

File details

Details for the file hgsig-0.1.6-py2.py3-none-any.whl.

File metadata

  • Download URL: hgsig-0.1.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for hgsig-0.1.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a0bdeeaedf6ccf3ee4e9d3b722356fd22db485ed77ac544429cc6cc8796ecf50
MD5 8cfd59e1c15f5bfb26e0b89492eec8fc
BLAKE2b-256 63538bdee64aff5be3131bf5c82a1ab80def96761a35bdefdf0491a76f83b641

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page