Skip to main content

Compositional data (CoDa) analysis tools for Python

Project description

pyCoDaMath

made-with-python

pyCoDaMath provides compositional data (CoDa) analysis tools for Python

Getting Started

This package extends the Pandas dataframe object with various CoDa tools. It also provides a set of plotting functions for CoDa figures.

Installation

Clone the git repo to your local hard drive:

git clone https://bitbucket.org/genomicepidemiology/pycodamath.git

Enter the directory and install:

pip install .

Usage

The pyCoDaMath module is loaded as

import pycodamath

At this point, in order to get CLR values from a Pandas DataFrame df, do

df.coda.clr()

Documentation

CLR transformation - point estimate

df.coda.clr()

Returns centered logratio coefficients. If the dataframe contains zeros, values will be replaced by the Aitchison mean point estimate.

CLR transformation - standard deviation

df.coda.clr_std(n_samples=5000)

Returns the standard deviation of n_samples random draws in CLR space.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ALR transformation - point estimate

df.coda.alr(part=None)

Returns additive logratio values. If part is None, the last part of the composition is used as the denominator.

Parameters

  • part (str) - Name of the part to use as denominator.

ALR transformation - standard deviation

df.coda.alr_std(part=None, n_samples=5000)

Returns the standard deviation of n_samples random draws in ALR space.

Parameters

  • part (str) - Name of the part to use as denominator.
  • n_samples (int) - Number of random draws from a Dirichlet distribution.

ILR transformation - point estimate

df.coda.ilr(psi=None)

Returns isometric logratio values. If no basis is given, a default sequential binary partition basis is used.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

ILR inverse transformation

df.coda.ilr_inv(psi=None)

Returns the composition corresponding to a set of ILR coordinates. The same basis used for the forward transform must be supplied.

Parameters

  • psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.

Aitchison point estimate

df.coda.aitchison_mean(alpha=1.0)

Returns the Bayesian point estimate based on the Dirichlet concentration parameter alpha. Use values between 0.5 (sparse prior) and 1.0 (flat prior).

Parameters

  • alpha (float) - Dirichlet concentration parameter. Defaults to 1.0.

Bayesian zero replacement

df.coda.zero_replacement(n_samples=5000)

Returns a count table with zero values replaced by finite values using Bayesian inference.

Parameters

  • n_samples (int) - Number of random draws from a Dirichlet distribution.

Closure

df.coda.closure(N)

Applies closure to constant N to the composition.

Parameters

  • N (float) - Closure constant.

Variance matrix

df.coda.varmatrix(nmp=False)

Returns the total variation matrix of a composition. For large datasets, variance is estimated from at most 500 rows.

Parameters

  • nmp (bool) - If True, return a numpy array instead of a DataFrame. Defaults to False.

Total variance

df.coda.totvar()

Returns the total variance of a set of compositions, computed as the sum of the variance matrix divided by twice the number of parts.

Geometric mean

df.coda.gmean()

Returns the geometric mean of a set of compositions as percentages.

Power transformation

df.coda.power(alpha)

Applies compositional scalar multiplication (power transformation).

Parameters

  • alpha (float) - Scalar multiplier.

Perturbation

df.coda.perturbation(comp)

Applies a compositional perturbation (Aitchison addition) with another composition.

Parameters

  • comp (array_like) - Composition to perturb with.

Scaling

df.coda.scale()

Scales the composition by the reciprocal of the square root of the total variance.

Centering

df.coda.center()

Centers the composition by perturbing with the reciprocal of the geometric mean.


Plotting functions

Ternary diagram

pycodamath.plot.ternary(data, descr=None, center=False, conf=False)

Plots a ternary diagram from a three-part composition closed to 100.

Parameters

  • data (DataFrame) - Three-part compositional data, closed to 100.
  • descr (Series) - Optional grouping variable; if provided, points are coloured by group.
  • center (bool) - If True, the composition is centred before plotting. Defaults to False.
  • conf (bool) - If True, a 95% confidence ellipse is overlaid. Defaults to False.

Scree plot

pycodamath.pca.scree_plot(axis, eig_val)

Plots a scree plot of explained variance from singular values.

Parameters

  • axis - A Matplotlib axes object.
  • eig_val (array_like) - Singular values from SVD.

PCA biplot

class pycodamath.pca.Biplot(data, axis=None, default=True)

Creates a PCA biplot based on a centered log-ratio transformation of the data.

Parameters

  • data (DataFrame) - Compositional count data to analyse.
  • axis - A Matplotlib axes object. If None, a new figure is created.
  • default (bool) - If True, loadings and scores are plotted immediately. Defaults to True.

The following methods are available for customising the biplot:

  • plotloadings(cutoff=0, scale=None, labels=None, cluster=False) — plot loading arrows. Set cutoff (as a fraction of the maximum loading length) to suppress short loadings. Set cluster=True to reduce the number of loadings by hierarchical clustering; the resulting cluster legend is accessible as biplot.clusterlegend.
  • plotloadinglabels(labels=None, loadings=None, cutoff=0) — add text labels to loadings.
  • adjustloadinglabels() — shift loading labels to reduce overlap.
  • plotscores(group=None, palette=None, legend=True, labels=None) — plot sample scores as points, optionally coloured by group.
  • plotscorelabels(labels=None) — add text labels to the scores.
  • plotellipses(group, palette=None, legend=False) — plot 90% confidence ellipses for each group (requires at least 3 samples per group).
  • plotcentroids(group, palette=None, legend=False) — plot the centroid of each group.
  • plothulls(group, palette=None, legend=True) — plot convex hulls around each group (requires at least 3 samples per group).
  • plotcontours(group, palette=None, legend=True, plot_outliers=True, percent_outliers=0.1, linewidth=2.2) — plot kernel density contours for each group. Samples outside the outermost contour are optionally shown as individual points.
  • labeloutliers(group, conf=3.0) — label samples more than conf standard deviations from their group centroid.
  • displaylegend(loc=2) — display the group legend at Matplotlib legend location loc.
  • removepatches() — remove loading arrows and hull polygons from the plot.
  • removescores() — remove score points from the plot.
  • removelabels() — remove text labels from the plot.
  • removecontours() — remove contour fills from the plot.

The keyword labels is a list of label names. If labels is None, all labels are plotted.

The keyword group is a Pandas Series with an index matching the data index.

The keyword palette is a dict mapping each unique group value to a colour.

Example

import pycodamath as coda
import pandas as pd
data = pd.read_csv('example/kilauea_iki_chem.csv')
mypca = coda.pca.Biplot(data)
mypca.removelabels()
mypca.plotloadings(cluster=True)
print(mypca.clusterlegend)
mypca.removelabels()
mypca.plotloadings(labels=['FeO', 'Al2O3', 'CaO'], cluster=False)
mypca.adjustloadinglabels()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycodamath-1.1.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycodamath-1.1-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file pycodamath-1.1.tar.gz.

File metadata

  • Download URL: pycodamath-1.1.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1.tar.gz
Algorithm Hash digest
SHA256 40389bf6188cb5a0d757df1729aa09d3bab2eb3af85cd28c5b511d1c57550cf3
MD5 405d0685a5b096598a06ec3a25967912
BLAKE2b-256 4ea2bbd878f8a3714b0cf1628bade73ba3b1860b6e18c2a68b3e7efe0c700075

See more details on using hashes here.

File details

Details for the file pycodamath-1.1-py3-none-any.whl.

File metadata

  • Download URL: pycodamath-1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pycodamath-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 47d8f648ee52bbc5a146a8cbb91dc3a69c52c8f9a96a2bfe2136e9f665713929
MD5 53fe0d397fefd7843dcc9e526ceb8649
BLAKE2b-256 e9dcf8f885636dffd7e3d61717db0d734b212ecddad3953c16dcd7afdfe0ebc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page