Skip to main content

An implementation of the Gaussian multi-Graphical Model

Project description

GmGM-python

A python package for the GmGM algorithm. Read the pre-print here.

This is a very early version so the API is subject to change.

Installation

We recommend installing this package in a conda environment, by first running:

conda create -n {YOUR ENVIRONMENT NAME} "python>=3.9,<3.12"
conda activate {YOUR ENVIROMENT NAME}

Afterwards you can install it via pip.

# Pip
python -m pip install GmGM

Conda install coming soon.

About

This package learns a graphical representation of every "axis" of your data. For example, if you had a paired scRNA+scATAC multi-omics dataset, then your axes would be "genes" (columns of scRNA matrix), "axes" (columns of scATAC matrix), and "cells" (rows of both matrices).

This package works on any dataset that can be expressed as multiple tensors of arbitrary length (so multi-omics, videos, etc...). The only restriction is that no tensor can have the same axis twice (no "genes x genes" matrix); the same axis can appear multiple times, as long as it only appears once per matrix.

Usage

For an example, we recommend looking at the danio_rerio.ipynb notebook.

With AnnData

If you already have your data stored as an AnnData object, GmGM can be used directly. Suppose you had a single-cell RNA sequencing dataset scRNA.

GmGM(
    scRNA,
    to_keep={
        "obs": 10,
        "var": 10,
    }
)

"obs": 10 tells the algorithm to keep 10 edges per cell (the 'obs' axis of AnnData) and "var": 10 tells the algorithm to keep 10 edges per gene (the 'var' axis of AnnData).

This modifies the AnnData object in place, storing the resultant graphs in scRNA.obsp["obs_gmgm_connectivities"] and scRNA.varp["var_gmgm_connectivities"].

With MuData

Native support for MuData is coming soon.

General Usage (i.e. Without AnnData/MuData)

The first step is to express your dataset as a Dataset object. Suppose you had a cells x genes scRNA matrix and cells x peaks scATAC matrix, then you could create a Dataset object like:

from GmGM import Dataset
dataset: Dataset = Dataset(
    dataset={
        "scRNA": scRNA,
        "scATAC": scATAC
    },
    structure={
        "scRNA": ("cell", "gene"),
        "scATAC": ("cell", "peak")
    }
)

Running GmGM is then as simple as:

GmGM(
    dataset,
    to_keep={
        "cell": 10,
        "gene": 10,
        "peak": 10
    }
)

to_keep tells the algorithm how many edges to keep per cell/gene/peak.

The final results are stored in dataset.precision_matrices["cell"], dataset.precision_matrices["gene"], and dataset.precision_matrices["peak"], respectively.

Roadmap

  • Add direct support for AnnData objects
  • Add direct support for MuData objects (so that converson to Dataset is not needed)
  • Stabilize API
  • Add comprehensive docs
  • Have generate_data directly generate Dataset objects
  • Add conda distribution
  • Add example notebook
  • Make sure regularizers still work
  • Make sure priors still work
  • Make sure covariance thresholding trick still works
  • Add unit tests
  • Allow forcing subset of axes to have given precision matrices

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmgm-0.2.0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

gmgm-0.2.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file gmgm-0.2.0.tar.gz.

File metadata

  • Download URL: gmgm-0.2.0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for gmgm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 005a3fc7b2a1c7258102c2214f47a688859fe4b877646c4d271e68462c81f832
MD5 013c17577b47bae1f2e47caa85f0a596
BLAKE2b-256 0b1969a86bb696555b8ed71e780120a70fc1ce373cbae107190aaef5a2c320c8

See more details on using hashes here.

File details

Details for the file gmgm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gmgm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for gmgm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1bfa997933940b133fa1363b3cd4718a7a631ac31ce79c78a6fd9cd75108841
MD5 c0f1e9e95a4c9ae43c6e02f85d7750c0
BLAKE2b-256 bc66508496d17c15152d4cfb40f2ce3e9f83a1a047c2796c7a79b8e9d6f75546

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page