A kernel-based nonparametric regression and classification framework for compositional data.
Project description
KernelBiome package
The KernelBiome python package can be installed via
pip install kernelbiome
or
python -m pip install git+https://github.com/shimenghuang/KernelBiome.git
Small usage example:
import numpy as np
from kernelbiome.kernelbiome import KernelBiome
# Simulated some data
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
X3 = np.random.normal(0, 1, n)
X4 = np.random.normal(0, 1, n)
X = np.exp(np.c_[X1, X2, X3, X4])
X /= X.sum(axis=1)[:, None]
y = 5*(X[:, 0]+X[:, 1])/(X[:, 0]+X[:, 1]+X[:, 2]) + np.random.normal(0, 1, n)/2
# Fit KernelBiome
models = {
'linear': None,
'aitchison': {'c': np.logspace(-7, -3, 5)},
}
KB = KernelBiome(kernel_estimator='KernelRidge',
center_kmat=True,
models=models, # `models=None` for using all default models
verbose=1)
KB.fit(X, y)
# Calculate mean squared error
MSE = np.sqrt(np.mean((KB.predict(X) - y)**2))
For a complete usage example, see kernelbiome_illustration.py
Reproducible Code
This repository contains the python package KernelBiome and code that can reproduce results in the paper Supervised Learning and Model Analysis with Compositional Data (Huang et al., 2022).
All scripts producing results in the paper can be found in the experiments folder with some helper functions for the experiment scripts located in the helpers folder. Scripts starting with "run_" are used to run computation and save results, and scripts starting with "summarize_" are used to load and summarize results in e.g. figures. data_original and data_processed are folder to place the original and to save the processed datasets respectively. See README files therein for details.
prediction
Prediction comparison on the 33 publicly available datasets on classification and regression.
post_analysis
Post-analysis including CFI and kernel PCA for two of the public datasets, cirrhosis and centralpark.
tree_visualization
Visualization of CFI base on weighted and unweighted KernelBiome.
consistency
Simulation to show consistency results in the paper.
toy_examples
log_contrast_example.py: Illustration of CFI and CPD in the case of log contrast model using simulated data.
rescale_matters_example.py: Comparison of CFI and CPD with relative influence (RI) and partial dependency plot (PDP) based on simulated data.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kernelbiome-1.0.2.tar.gz.
File metadata
- Download URL: kernelbiome-1.0.2.tar.gz
- Upload date:
- Size: 32.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73c1556cff6727c87a89622fdb4d8a461a755692176f49fc0ea97b0cd978c4be
|
|
| MD5 |
9ab8570686d90c64f46e43254e68b0c2
|
|
| BLAKE2b-256 |
431f58fa5e0ca82336256ae8ae3568319923d41446366b21712fb67afe725521
|
File details
Details for the file kernelbiome-1.0.2-py3-none-any.whl.
File metadata
- Download URL: kernelbiome-1.0.2-py3-none-any.whl
- Upload date:
- Size: 34.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de2b491656cd17cd25dae7dfd89f7ff7c96af03bc5d2b8835a2b24b4a0a20409
|
|
| MD5 |
2855083c5b89fca4b6b538b1314f574c
|
|
| BLAKE2b-256 |
6732c582d966b4607390e27fe50d55f09c1f4e2e5af538f24efabeb6ce1ff238
|