Compositional data (CoDa) analysis tools for Python
Project description
pyCoDaMath
pyCoDaMath provides compositional data (CoDa) analysis tools for Python
- Source code: https://bitbucket.org/genomicepidemiology/pycodamath
Getting Started
This package extends the Pandas dataframe object with various CoDa tools. It also provides a set of plotting functions for CoDa figures.
Installation
Clone the git repo to your local hard drive:
git clone https://bitbucket.org/genomicepidemiology/pycodamath.git
Enter the directory and install:
pip install .
Usage
The pyCoDaMath module is loaded as
import pycodamath
At this point, in order to get CLR values from a Pandas DataFrame df, do
df.coda.clr()
Documentation
CLR transformation - point estimate
df.coda.clr()
Returns centered logratio coefficients. If the dataframe contains zeros, values will be replaced by the Aitchison mean point estimate.
CLR transformation - standard deviation
df.coda.clr_std(n_samples=5000)
Returns the standard deviation of n_samples random draws in CLR space.
Parameters
- n_samples (int) - Number of random draws from a Dirichlet distribution.
ALR transformation - point estimate
df.coda.alr(part=None)
Returns additive logratio values. If part is None, the last part of the composition is used as the denominator.
Parameters
- part (str) - Name of the part to use as denominator.
ALR transformation - standard deviation
df.coda.alr_std(part=None, n_samples=5000)
Returns the standard deviation of n_samples random draws in ALR space.
Parameters
- part (str) - Name of the part to use as denominator.
- n_samples (int) - Number of random draws from a Dirichlet distribution.
ILR transformation - point estimate
df.coda.ilr(psi=None)
Returns isometric logratio values. If no basis is given, a default sequential binary partition basis is used.
Parameters
- psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.
ILR inverse transformation
df.coda.ilr_inv(psi=None)
Returns the composition corresponding to a set of ILR coordinates. The same basis used for the forward transform must be supplied.
Parameters
- psi (array_like) - Orthonormal basis. If None, the default SBP basis is used.
Aitchison point estimate
df.coda.aitchison_mean(alpha=1.0)
Returns the Bayesian point estimate based on the Dirichlet concentration parameter alpha. Use values between 0.5 (sparse prior) and 1.0 (flat prior).
Parameters
- alpha (float) - Dirichlet concentration parameter. Defaults to 1.0.
Bayesian zero replacement
df.coda.zero_replacement(n_samples=5000)
Returns a count table with zero values replaced by finite values using Bayesian inference.
Parameters
- n_samples (int) - Number of random draws from a Dirichlet distribution.
Closure
df.coda.closure(N)
Applies closure to constant N to the composition.
Parameters
- N (float) - Closure constant.
Variance matrix
df.coda.varmatrix(nmp=False)
Returns the total variation matrix of a composition. For large datasets, variance is estimated from at most 500 rows.
Parameters
- nmp (bool) - If True, return a numpy array instead of a DataFrame. Defaults to False.
Total variance
df.coda.totvar()
Returns the total variance of a set of compositions, computed as the sum of the variance matrix divided by twice the number of parts.
Geometric mean
df.coda.gmean()
Returns the geometric mean of a set of compositions as percentages.
Power transformation
df.coda.power(alpha)
Applies compositional scalar multiplication (power transformation).
Parameters
- alpha (float) - Scalar multiplier.
Perturbation
df.coda.perturbation(comp)
Applies a compositional perturbation (Aitchison addition) with another composition.
Parameters
- comp (array_like) - Composition to perturb with.
Scaling
df.coda.scale()
Scales the composition by the reciprocal of the square root of the total variance.
Centering
df.coda.center()
Centers the composition by perturbing with the reciprocal of the geometric mean.
Plotting functions
Ternary diagram
pycodamath.plot.ternary(data, descr=None, center=False, conf=False)
Plots a ternary diagram from a three-part composition closed to 100.
Parameters
- data (DataFrame) - Three-part compositional data, closed to 100.
- descr (Series) - Optional grouping variable; if provided, points are coloured by group.
- center (bool) - If True, the composition is centred before plotting. Defaults to False.
- conf (bool) - If True, a 95% confidence ellipse is overlaid. Defaults to False.
Scree plot
pycodamath.pca.scree_plot(axis, eig_val)
Plots a scree plot of explained variance from singular values.
Parameters
- axis - A Matplotlib axes object.
- eig_val (array_like) - Singular values from SVD.
PCA biplot
class pycodamath.pca.Biplot(data, axis=None, default=True)
Creates a PCA biplot based on a centered log-ratio transformation of the data.
Parameters
- data (DataFrame) - Compositional count data to analyse.
- axis - A Matplotlib axes object. If None, a new figure is created.
- default (bool) - If True, loadings and scores are plotted immediately. Defaults to True.
The following methods are available for customising the biplot:
plotloadings(cutoff=0, scale=None, labels=None, cluster=False)— plot loading arrows. Setcutoff(as a fraction of the maximum loading length) to suppress short loadings. Setcluster=Trueto reduce the number of loadings by hierarchical clustering; the resulting cluster legend is accessible asbiplot.clusterlegend.plotloadinglabels(labels=None, loadings=None, cutoff=0)— add text labels to loadings.adjustloadinglabels()— shift loading labels to reduce overlap.plotscores(group=None, palette=None, legend=True, labels=None)— plot sample scores as points, optionally coloured by group.plotscorelabels(labels=None)— add text labels to the scores.plotellipses(group, palette=None, legend=False)— plot 90% confidence ellipses for each group (requires at least 3 samples per group).plotcentroids(group, palette=None, legend=False)— plot the centroid of each group.plothulls(group, palette=None, legend=True)— plot convex hulls around each group (requires at least 3 samples per group).plotcontours(group, palette=None, legend=True, plot_outliers=True, percent_outliers=0.1, linewidth=2.2)— plot kernel density contours for each group. Samples outside the outermost contour are optionally shown as individual points.labeloutliers(group, conf=3.0)— label samples more thanconfstandard deviations from their group centroid.displaylegend(loc=2)— display the group legend at Matplotlib legend locationloc.removepatches()— remove loading arrows and hull polygons from the plot.removescores()— remove score points from the plot.removelabels()— remove text labels from the plot.removecontours()— remove contour fills from the plot.
The keyword labels is a list of label names. If labels is None, all labels are plotted.
The keyword group is a Pandas Series with an index matching the data index.
The keyword palette is a dict mapping each unique group value to a colour.
Example
import pycodamath as coda
import pandas as pd
data = pd.read_csv('example/kilauea_iki_chem.csv')
mypca = coda.pca.Biplot(data)
mypca.removelabels()
mypca.plotloadings(cluster=True)
print(mypca.clusterlegend)
mypca.removelabels()
mypca.plotloadings(labels=['FeO', 'Al2O3', 'CaO'], cluster=False)
mypca.adjustloadinglabels()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycodamath-1.1.1.tar.gz.
File metadata
- Download URL: pycodamath-1.1.1.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b8d4ab92bbdd01fd38c153a998f9915a9c9026654105fd3d5b1739eb656bdbf
|
|
| MD5 |
73f4a3bf0dc134e80166d229464c1a2c
|
|
| BLAKE2b-256 |
f45465f4c07a492efcfa34d28724e3529fe7d9fa898d8cc5cd19ad979b0031ed
|
File details
Details for the file pycodamath-1.1.1-py3-none-any.whl.
File metadata
- Download URL: pycodamath-1.1.1-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c12e5267c2d39e92a4d145bf04d2cf18b914dd14ef1a457fbe9da6e0fc876da
|
|
| MD5 |
ab12073a99d4df1f7b847ba8ef6ac29b
|
|
| BLAKE2b-256 |
ab9dde129ff22e5ec4dafb539a77a4b7d5d6701c653efe6fa1630aff275ca23f
|