Skip to main content

A kernel-based nonparametric regression and classification framework for compositional data.

Project description

KernelBiome package

The KernelBiome python package can be installed via

pip install kernelbiome

or

python -m pip install git+https://github.com/shimenghuang/KernelBiome.git

Small usage example:

import numpy as np
from kernelbiome.kernelbiome import KernelBiome

# Simulated some data
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
X3 = np.random.normal(0, 1, n)
X4 = np.random.normal(0, 1, n)
X = np.exp(np.c_[X1, X2, X3, X4])
X /= X.sum(axis=1)[:, None]
y = 5*(X[:, 0]+X[:, 1])/(X[:, 0]+X[:, 1]+X[:, 2]) + np.random.normal(0, 1, n)/2

# Fit KernelBiome
models = {
    'linear': None,
    'aitchison': {'c': np.logspace(-7, -3, 5)},
}
KB = KernelBiome(kernel_estimator='KernelRidge',
                 center_kmat=True,
                 models=models, # `models=None` for using all default models
                 verbose=1)
KB.fit(X, y)

# Calculate mean squared error
MSE = np.sqrt(np.mean((KB.predict(X) - y)**2))

For a complete usage example, see kernelbiome_illustration.py

Reproducible Code

This repository contains the python package KernelBiome and code that can reproduce results in the paper Supervised Learning and Model Analysis with Compositional Data (Huang et al., 2022).

All scripts producing results in the paper can be found in the experiments folder with some helper functions for the experiment scripts located in the helpers folder. Scripts starting with "run_" are used to run computation and save results, and scripts starting with "summarize_" are used to load and summarize results in e.g. figures. data_original and data_processed are folder to place the original and to save the processed datasets respectively. See README files therein for details.

prediction

Prediction comparison on the 33 publicly available datasets on classification and regression.

post_analysis

Post-analysis including CFI and kernel PCA for two of the public datasets, cirrhosis and centralpark.

tree_visualization

Visualization of CFI base on weighted and unweighted KernelBiome.

consistency

Simulation to show consistency results in the paper.

toy_examples

log_contrast_example.py: Illustration of CFI and CPD in the case of log contrast model using simulated data.

rescale_matters_example.py: Comparison of CFI and CPD with relative influence (RI) and partial dependency plot (PDP) based on simulated data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kernelbiome-1.0.2.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kernelbiome-1.0.2-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file kernelbiome-1.0.2.tar.gz.

File metadata

  • Download URL: kernelbiome-1.0.2.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.9

File hashes

Hashes for kernelbiome-1.0.2.tar.gz
Algorithm Hash digest
SHA256 73c1556cff6727c87a89622fdb4d8a461a755692176f49fc0ea97b0cd978c4be
MD5 9ab8570686d90c64f46e43254e68b0c2
BLAKE2b-256 431f58fa5e0ca82336256ae8ae3568319923d41446366b21712fb67afe725521

See more details on using hashes here.

File details

Details for the file kernelbiome-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: kernelbiome-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.9

File hashes

Hashes for kernelbiome-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 de2b491656cd17cd25dae7dfd89f7ff7c96af03bc5d2b8835a2b24b4a0a20409
MD5 2855083c5b89fca4b6b538b1314f574c
BLAKE2b-256 6732c582d966b4607390e27fe50d55f09c1f4e2e5af538f24efabeb6ce1ff238

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page