An implementation of the Gaussian multi-Graphical Model
Project description
GmGM-python
A python package for the GmGM algorithm. Read the pre-print here.
This is a very early version so the API is subject to change.
Installation
We recommend installing this package in a conda environment, by first running:
conda create -n {YOUR ENVIRONMENT NAME} "python>=3.9,<3.12"
conda activate {YOUR ENVIROMENT NAME}
Afterwards you can install it via pip.
# Pip
python -m pip install GmGM
Conda install coming soon.
About
This package learns a graphical representation of every "axis" of your data. For example, if you had a paired scRNA+scATAC multi-omics dataset, then your axes would be "genes" (columns of scRNA matrix), "axes" (columns of scATAC matrix), and "cells" (rows of both matrices).
This package works on any dataset that can be expressed as multiple tensors of arbitrary length (so multi-omics, videos, etc...). The only restriction is that no tensor can have the same axis twice (no "genes x genes" matrix); the same axis can appear multiple times, as long as it only appears once per matrix.
Usage
For an example, we recommend looking at the danio_rerio.ipynb
notebook.
With AnnData
If you already have your data stored as an AnnData object, GmGM can be used directly. Suppose you had a single-cell RNA sequencing dataset scRNA
.
GmGM(
scRNA,
to_keep={
"obs": 10,
"var": 10,
}
)
"obs": 10
tells the algorithm to keep 10 edges per cell (the 'obs' axis of AnnData) and "var": 10
tells the algorithm to keep 10 edges per gene (the 'var' axis of AnnData).
This modifies the AnnData object in place, storing the resultant graphs in scRNA.obsp["obs_gmgm_connectivities"]
and scRNA.varp["var_gmgm_connectivities"]
.
With MuData
Mudata support is very similar to AnnData support. Suppose we had a MuData object mudata
with scATAC and scRNA data, then:
GmGM(
mudata,
to_keep={
"obs": 10,
"rna-var": 10,
"atac-var": 10
}
)
# Cell graph
scRNA.obsp["obs_gmgm_connectivities"]
# Gene graph
scRNA.varp["rna-var_gmgm_connectivities"]
# Peak graph
scRNA.varp["atac-var_gmgm_connectivities"]
In general, accessing features can be done by appending the name of the modality onto "var"
, i.e. "metabolomics-var"
if the MuData has a metabolomics modality.
Note that we (will, not currently) support MuData with the axis=1
and axis=-1
parameters as well.
General Usage (i.e. Without AnnData/MuData)
The first step is to express your dataset as a Dataset
object. Suppose you had a cells x genes scRNA matrix and cells x peaks scATAC matrix, then you could create a Dataset
object like:
from GmGM import Dataset
dataset: Dataset = Dataset(
dataset={
"scRNA": scRNA,
"scATAC": scATAC
},
structure={
"scRNA": ("cell", "gene"),
"scATAC": ("cell", "peak")
}
)
Running GmGM is then as simple as:
GmGM(
dataset,
to_keep={
"cell": 10,
"gene": 10,
"peak": 10
}
)
to_keep
tells the algorithm how many edges to keep per cell/gene/peak.
The final results are stored in dataset.precision_matrices["cell"]
, dataset.precision_matrices["gene"]
, and dataset.precision_matrices["peak"]
, respectively.
Roadmap
- Add direct support for AnnData objects
- Add direct support for MuData objects (so that converson to
Dataset
is not needed) - Stabilize API
- Add comprehensive docs
- Have
generate_data
directly generateDataset
objects - Add conda distribution
- Add example notebook
- Make sure regularizers still work
- Make sure priors still work
- Make sure covariance thresholding trick still works
- Add unit tests
- Allow forcing subset of axes to have given precision matrices
- Add random generation of count matrix data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gmgm-0.5.3.tar.gz
.
File metadata
- Download URL: gmgm-0.5.3.tar.gz
- Upload date:
- Size: 14.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f1af3dbfa5a11f9f6b200b04d97b89b568a2a34ecb3846de5b21d9e724bea22 |
|
MD5 | c01d78a18a391b7e688e075dac00007c |
|
BLAKE2b-256 | 3f186fc17ad9d04c8562e295cf7ec175b78be2534b6ebe7fab2995c9a75e97c0 |
File details
Details for the file gmgm-0.5.3-py3-none-any.whl
.
File metadata
- Download URL: gmgm-0.5.3-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae8c50f3a82c0f2e466783cc8477f85ee90b12e5fa0cf7382f81db032d87f6bc |
|
MD5 | 600a5442eaa5bdf0d12cdb9489f941a1 |
|
BLAKE2b-256 | ab80ab2462233f9d76252d101e231989e70f95d9e6eba15196d3207fe3f26368 |