Skip to main content

Compositional data (CoDa) analysis tools for Python

Project description

pyCoDaMath

made-with-python

pyCoDaMath provides compositional data (CoDa) analysis tools for Python

Getting Started

This package extends the Pandas dataframe object with various CoDa tools. It also provides a set of plotting functions for CoDa figures.

Installation

Clone the git repo to your local hard drive:

git clone https://bitbucket.org/genomicepidemiology/pycodamath.git

Enter the directory and install:

pip install .

Usage

The pyCoDaMath module is loaded as

import pycodamath

At this point, in order to get CLR values from a Pandas DataFrame df, do

df.coda.clr()

Documentation

CLR transformation - point estimate

df.coda.clr()

Returns centered logratio coefficients. If the dataframe contains zeros, values will be replaced by the Aitchison mean point estimate.

CLR transformation - standard deviation

df.coda.clr_std(n_samples=5000)

Returns the standard deviation of n_samples random draws in CLR space.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ALR transformation - point estimate

df.coda.alr(part=None)

Returns additive logratio values. If part is None, the last part of the composition is used as the denominator.

Parameters

  • part (str) - Name of the part to use as denominator.

ALR transformation - standard deviation

df.coda.alr_std(part=None, n_samples=5000)

Returns the standard deviation of n_samples random draws in ALR space.

Parameters

  • part (str) - Name of the part to use as denominator.
  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ILR transformation - point estimate

df.coda.ilr(psi=None)

Returns isometric logratio values. If no basis is given, a default sequential binary partition basis is used.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

ILR inverse transformation

df.coda.ilr_inv(psi=None)

Returns the composition corresponding to a set of ILR coordinates. The same basis used for the forward transform must be supplied.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

Aitchison point estimate

df.coda.aitchison_mean(alpha=1.0)

Returns the Bayesian point estimate based on the Dirichlet concentration parameter alpha. Use values between 0.5 (sparse prior) and 1.0 (flat prior).

Parameters

  • alpha (float) - Dirichlet concentration parameter. Defaults to 1.0.

Bayesian zero replacement

df.coda.zero_replacement(n_samples=5000)

Returns a count table with zero values replaced by finite values using Bayesian inference.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

Closure

df.coda.closure(N)

Applies closure to constant N to the composition.

Parameters

  • N (float) - Closure constant.

Variance matrix

df.coda.varmatrix(nmp=False)

Returns the total variation matrix of a composition. For large datasets, variance is estimated from at most 500 rows.

Parameters

  • nmp (bool) - If True, return a numpy array instead of a DataFrame. Defaults to False.

Total variance

df.coda.totvar()

Returns the total variance of a set of compositions, computed as the sum of the variance matrix divided by twice the number of parts.

Geometric mean

df.coda.gmean()

Returns the geometric mean of a set of compositions as percentages.

Power transformation

df.coda.power(alpha)

Applies compositional scalar multiplication (power transformation).

Parameters

  • alpha (float) - Scalar multiplier.

Perturbation

df.coda.perturbation(comp)

Applies a compositional perturbation (Aitchison addition) with another composition.

Parameters

  • comp (array_like) - Composition to perturb with.

Scaling

df.coda.scale()

Scales the composition by the reciprocal of the square root of the total variance.

Centering

df.coda.center()

Centers the composition by perturbing with the reciprocal of the geometric mean.


Plotting functions

Ternary diagram

pycodamath.plot.ternary(data, descr=None, center=False, conf=False)

Plots a ternary diagram from a three-part composition closed to 100.

Parameters

  • data (DataFrame) - Three-part compositional data, closed to 100.
  • descr (Series) - Optional grouping variable; if provided, points are coloured by group.
  • center (bool) - If True, the composition is centred before plotting. Defaults to False.
  • conf (bool) - If True, a 95% confidence ellipse is overlaid. Defaults to False.

Scree plot

pycodamath.pca.scree_plot(axis, eig_val)

Plots a scree plot of explained variance from singular values.

Parameters

  • axis - A Matplotlib axes object.
  • eig_val (array_like) - Singular values from SVD.

PCA biplot

class pycodamath.pca.Biplot(data, axis=None, default=True)

Creates a PCA biplot based on a centered log-ratio transformation of the data.

Parameters

  • data (DataFrame) - Compositional count data to analyse.
  • axis - A Matplotlib axes object. If None, a new figure is created.
  • default (bool) - If True, loadings and scores are plotted immediately. Defaults to True.

The following methods are available for customising the biplot:

  • plotloadings(cutoff=0, scale=None, labels=None, cluster=False) — plot loading arrows. Set cutoff (as a fraction of the maximum loading length) to suppress short loadings. Set cluster=True to reduce the number of loadings by hierarchical clustering; the resulting cluster legend is accessible as biplot.clusterlegend.
  • plotloadinglabels(labels=None, loadings=None, cutoff=0) — add text labels to loadings.
  • adjustloadinglabels() — shift loading labels to reduce overlap.
  • plotscores(group=None, palette=None, legend=True, labels=None) — plot sample scores as points, optionally coloured by group.
  • plotscorelabels(labels=None) — add text labels to the scores.
  • plotellipses(group, palette=None, legend=False) — plot 90% confidence ellipses for each group (requires at least 3 samples per group).
  • plotcentroids(group, palette=None, legend=False) — plot the centroid of each group.
  • plothulls(group, palette=None, legend=True) — plot convex hulls around each group (requires at least 3 samples per group).
  • plotcontours(group, palette=None, legend=True, plot_outliers=True, percent_outliers=0.1, linewidth=2.2) — plot kernel density contours for each group. Samples outside the outermost contour are optionally shown as individual points.
  • labeloutliers(group, conf=3.0) — label samples more than conf standard deviations from their group centroid.
  • displaylegend(loc=2) — display the group legend at Matplotlib legend location loc.
  • removepatches() — remove loading arrows and hull polygons from the plot.
  • removescores() — remove score points from the plot.
  • removelabels() — remove text labels from the plot.
  • removecontours() — remove contour fills from the plot.

The keyword labels is a list of label names. If labels is None, all labels are plotted.

The keyword group is a Pandas Series with an index matching the data index.

The keyword palette is a dict mapping each unique group value to a colour.

Example

import pycodamath as coda
import pandas as pd
data = pd.read_csv('example/kilauea_iki_chem.csv')
mypca = coda.pca.Biplot(data)
mypca.removelabels()
mypca.plotloadings(cluster=True)
print(mypca.clusterlegend)
mypca.removelabels()
mypca.plotloadings(labels=['FeO', 'Al2O3', 'CaO'], cluster=False)
mypca.adjustloadinglabels()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycodamath-1.1.1.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycodamath-1.1.1-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file pycodamath-1.1.1.tar.gz.

File metadata

  • Download URL: pycodamath-1.1.1.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1.1.tar.gz
Algorithm Hash digest
SHA256 7b8d4ab92bbdd01fd38c153a998f9915a9c9026654105fd3d5b1739eb656bdbf
MD5 73f4a3bf0dc134e80166d229464c1a2c
BLAKE2b-256 f45465f4c07a492efcfa34d28724e3529fe7d9fa898d8cc5cd19ad979b0031ed

See more details on using hashes here.

File details

Details for the file pycodamath-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: pycodamath-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0c12e5267c2d39e92a4d145bf04d2cf18b914dd14ef1a457fbe9da6e0fc876da
MD5 ab12073a99d4df1f7b847ba8ef6ac29b
BLAKE2b-256 ab9dde129ff22e5ec4dafb539a77a4b7d5d6701c653efe6fa1630aff275ca23f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page