Skip to main content

Compositional data (CoDa) analysis tools for Python

Project description

pyCoDaMath

made-with-python

pyCoDaMath provides compositional data (CoDa) analysis tools for Python

Getting Started

This package extends the Pandas dataframe object with various CoDa tools. It also provides a set of plotting functions for CoDa figures.

Installation

Clone the git repo to your local hard drive:

git clone https://bitbucket.org/genomicepidemiology/pycodamath.git

Enter the directory and install:

pip install .

Usage

The pyCoDaMath module is loaded as

import pycodamath

At this point, in order to get CLR values from a Pandas DataFrame df, do

df.coda.clr()

Documentation

CLR transformation - point estimate

df.coda.clr()

Returns centered logratio coefficients. If the dataframe contains zeros, values will be replaced by the Aitchison mean point estimate.

CLR transformation - standard deviation

df.coda.clr_std(n_samples=5000)

Returns the standard deviation of n_samples random draws in CLR space.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ALR transformation - point estimate

df.coda.alr(part=None)

Returns additive logratio values. If part is None, the last part of the composition is used as the denominator.

Parameters

  • part (str) - Name of the part to use as denominator.

ALR transformation - standard deviation

df.coda.alr_std(part=None, n_samples=5000)

Returns the standard deviation of n_samples random draws in ALR space.

Parameters

  • part (str) - Name of the part to use as denominator.
  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ILR transformation - point estimate

df.coda.ilr(psi=None)

Returns isometric logratio values. If no basis is given, a default sequential binary partition basis is used.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

ILR inverse transformation

df.coda.ilr_inv(psi=None)

Returns the composition corresponding to a set of ILR coordinates. The same basis used for the forward transform must be supplied.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

Aitchison point estimate

df.coda.aitchison_mean(alpha=1.0)

Returns the Bayesian point estimate based on the Dirichlet concentration parameter alpha. Use values between 0.5 (sparse prior) and 1.0 (flat prior).

Parameters

  • alpha (float) - Dirichlet concentration parameter. Defaults to 1.0.

Bayesian zero replacement

df.coda.zero_replacement(n_samples=5000)

Returns a count table with zero values replaced by finite values using Bayesian inference.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

Closure

df.coda.closure(N)

Applies closure to constant N to the composition.

Parameters

  • N (float) - Closure constant.

Variance matrix

df.coda.varmatrix(nmp=False)

Returns the total variation matrix of a composition. For large datasets, variance is estimated from at most 500 rows.

Parameters

  • nmp (bool) - If True, return a numpy array instead of a DataFrame. Defaults to False.

Total variance

df.coda.totvar()

Returns the total variance of a set of compositions, computed as the sum of the variance matrix divided by twice the number of parts.

Geometric mean

df.coda.gmean()

Returns the geometric mean of a set of compositions as percentages.

Power transformation

df.coda.power(alpha)

Applies compositional scalar multiplication (power transformation).

Parameters

  • alpha (float) - Scalar multiplier.

Perturbation

df.coda.perturbation(comp)

Applies a compositional perturbation (Aitchison addition) with another composition.

Parameters

  • comp (array_like) - Composition to perturb with.

Scaling

df.coda.scale()

Scales the composition by the reciprocal of the square root of the total variance.

Centering

df.coda.center()

Centers the composition by perturbing with the reciprocal of the geometric mean.


Plotting functions

Ternary diagram

pycodamath.plot.ternary(data, descr=None, center=False, conf=False)

Plots a ternary diagram from a three-part composition closed to 100.

Parameters

  • data (DataFrame) - Three-part compositional data, closed to 100.
  • descr (Series) - Optional grouping variable; if provided, points are coloured by group.
  • center (bool) - If True, the composition is centred before plotting. Defaults to False.
  • conf (bool) - If True, a 95% confidence ellipse is overlaid. Defaults to False.

Scree plot

pycodamath.pca.scree_plot(axis, eig_val)

Plots a scree plot of explained variance from singular values.

Parameters

  • axis - A Matplotlib axes object.
  • eig_val (array_like) - Singular values from SVD.

PCA biplot

class pycodamath.pca.Biplot(data, axis=None, default=True)

Creates a PCA biplot based on a centered log-ratio transformation of the data.

Parameters

  • data (DataFrame) - Compositional count data to analyse.
  • axis - A Matplotlib axes object. If None, a new figure is created.
  • default (bool) - If True, loadings and scores are plotted immediately. Defaults to True.

The following methods are available for customising the biplot:

  • plotloadings(cutoff=0, scale=None, labels=None, cluster=False) — plot loading arrows. Set cutoff (as a fraction of the maximum loading length) to suppress short loadings. Set cluster=True to reduce the number of loadings by hierarchical clustering; the resulting cluster legend is accessible as biplot.clusterlegend.
  • plotloadinglabels(labels=None, loadings=None, cutoff=0) — add text labels to loadings.
  • adjustloadinglabels() — shift loading labels to reduce overlap.
  • plotscores(group=None, palette=None, legend=True, labels=None) — plot sample scores as points, optionally coloured by group.
  • plotscorelabels(labels=None) — add text labels to the scores.
  • plotellipses(group, palette=None, legend=False) — plot 90% confidence ellipses for each group (requires at least 3 samples per group).
  • plotcentroids(group, palette=None, legend=False) — plot the centroid of each group.
  • plothulls(group, palette=None, legend=True) — plot convex hulls around each group (requires at least 3 samples per group).
  • plotcontours(group, palette=None, legend=True, plot_outliers=True, percent_outliers=0.1, linewidth=2.2) — plot kernel density contours for each group. Samples outside the outermost contour are optionally shown as individual points.
  • labeloutliers(group, conf=3.0) — label samples more than conf standard deviations from their group centroid.
  • displaylegend(loc=2) — display the group legend at Matplotlib legend location loc.
  • removepatches() — remove loading arrows and hull polygons from the plot.
  • removescores() — remove score points from the plot.
  • removelabels() — remove text labels from the plot.
  • removecontours() — remove contour fills from the plot.

The keyword labels is a list of label names. If labels is None, all labels are plotted.

The keyword group is a Pandas Series with an index matching the data index.

The keyword palette is a dict mapping each unique group value to a colour.

Example

import pycodamath as coda
import pandas as pd
data = pd.read_csv('example/kilauea_iki_chem.csv')
mypca = coda.pca.Biplot(data)
mypca.remove("labels")
mypca.plotloadings(cluster=True)
print(mypca.clusterlegend)
mypca.remove("labels")
mypca.plotloadings(labels=['FeO', 'Al2O3', 'CaO'], cluster=False)
mypca.adjustloadinglabels()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycodamath-1.1.3.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycodamath-1.1.3-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file pycodamath-1.1.3.tar.gz.

File metadata

  • Download URL: pycodamath-1.1.3.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1.3.tar.gz
Algorithm Hash digest
SHA256 b0cb20b4888f32d6832dcf4ee58efe565801eefba2461e37bbcb330a476e92e7
MD5 9a9765c6c932f6eb8d7a173f18b4c22e
BLAKE2b-256 27ae894d47e02b9eeb618c83799f88a15dda443578a97a1d9cb4156ef90f78fe

See more details on using hashes here.

File details

Details for the file pycodamath-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: pycodamath-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c08670af7e738101f51743fe5d0edcce96353e90189e4d79897cef6ae43ca5fb
MD5 9a9827f8fe417337017ebdba605dcd85
BLAKE2b-256 b9edb9e2f007305b4a921fa1b635829e17983d2119d0b44392579a6108a19c47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page