Bi-cross validation of NMF and signature generation and analysis
Project description
cvanmf
An implementation of bicrossvalidation for Non-negative Matrix Factorisation (NMF) rank selection, along with methods for analysis and visualisation of NMF decomposition.
For details on the method, please see:
- Enterosignatures define common bacterial guilds in the human gut microbiome, Frioux, Clémence et al., Cell Host & Microbe, Volume 31, Issue 7, 1111 - 1125.e6 (https://doi.org/10.1016/j.chom.2023.05.024)
Graphical Abstract
The left section is a schematic depicting the procedures implemented in cvaNMF; on the right is a summary of results reported in the manuscript (in preparation).
Documentation
Documentation can be found at readthedocs.
Installation
cvanmf is available from bioconda
conda install --name {envname} -c bioconda -c conda-forge cvanmf
or pip
pip install cvanmf
Overview
NMF is an unsupervised machine learning techniques which provides a representation of a numeric input matrix $X$ as a mixture of $k$ of underlying parts. In this package we refer to each of these parts as a signature. Each signature can be described by how much each feature contributes to it. For example, we can represent the abundance of bacteria in the human gut as a mixture of 5 signatures.
The number of signatures (or rank, $k$) has to specified when performing NMF, and selecting an appropriate value for $k$ is an important step. We implement bicrossvalidation with Gabriel style holdouts. Broadly speaking, this method holds out one block of the matrix ($A$) and makes an estimate of it ($A'$) using the remainder of the matrix. How closely $A'$ resembles $A$ is used to identify and appropriate rank.
Input
Any numeric matrix can be used as input, with samples on columns, and features on rows. Each row should describe something similar, e.g. each is the abundance of a microbe, or abundance of a transcript. A minimum of 2 samples is required. When number of samples $n$ is close to the number of signatures $k$, signatures are likely to represent individual samples rather than broad patterns.
Container
We provide a container image for linux/amd64 on through the Github Container Repository (GHCR), with the current
version being ghcr.io/apduncan/cvanmf:latest/.
This is intended either for running cvanmf command-line tools, or using as a container for using cvanmf within
pipelines.
Please see the documentation for more details.
References
If you use this tool please cite: For details on the method, please see:
- Enterosignatures define common bacterial guilds in the human gut microbiome, Frioux, Clémence et al., Cell Host & Microbe, Volume 31, Issue 7, 1111 - 1125.e6 (https://doi.org/10.1016/j.chom.2023.05.024)
For background on NMF see:
- Lee & Seung, 1999 (https://doi.org/10.1038/44565) for the paper introducing NMF
- Jiang et al, 2012 (https://doi.org/10.1007/s00285-011-0428-2) for a good description of the method and application to metagenomic data
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cvanmf-1.0.0.tar.gz.
File metadata
- Download URL: cvanmf-1.0.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58979faea2d3760f5601a0b38b6ce517364d43a25f9e0f4ff7dab30af8e24430
|
|
| MD5 |
244208eb341a9cc3a68f9c698a5756f9
|
|
| BLAKE2b-256 |
55a76b005dc56ff264ea5f43cf4173d90b23326b41f534ee53e06ac972813877
|
File details
Details for the file cvanmf-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cvanmf-1.0.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1de30767c50c8e181f9c5219b5e349f6b70bdb79f182ea1392ab73cae9f7ae3
|
|
| MD5 |
d557c90f31205e9ef2963e76467da11c
|
|
| BLAKE2b-256 |
9c02c4aff07c1b92efcf69b9e698f47275dd97948c94433e2dabc96add02fe66
|