Compositional data (CoDa) analysis tools for Python
Project description
pyCoDaMath
pyCoDaMath provides compositional data (CoDa) analysis tools for Python
- Source code: https://bitbucket.org/genomicepidemiology/pycoda
Getting Started
This package extends the Pandas dataframe object with various CoDa tools. It also provides a set of plotting functions for CoDa figures.
Installation
Clone the git repo to your local hard drive:
git clone https://brinch@bitbucket.org/genomicepidemiology/pycoda.git
Enter pycoda directory and type
pip install ./
Usage
The pyCoDaMath module is loaded as
import pycodamath
At this point, in order to get CLR values from a Pandas DataFrame df, do
df.coda.clr()
Documentation
CLR transformation - point estimate
df.coda.clr()
Returns centered logratio coefficients. If the data frame contains zeros, values will be replaced by the Aitchison mean point estimate.
CLR transformation - standard deviation
df.coda.clr_std(n_samples=5000)
Returns the standard deviation of n_samples random draws in CLR space.
Parameters
- n_samples (int) - Number of random draws from a Dirichlet distribution.
ALR transformation - point estimate
df.coda.alr(part=None)
Same as clr() but returning additive logratio values. If part is None, then the last part of the composition is used, otherwise part is used as denominator.
Parameters
- part (str) - Name of the part to be used as denominator.
ALR transformation - standard deviation
df.coda.alr_std(part=None, n_samples=5000)
Same as clr_std, but in ALR space.
Parameters
-
part (str) - Name of the part to be used as denominator.
-
n_samples (int) - Number of random draws from a Dirichlet distribution.
ILR transformation - point estimate
df.coda.ilr(psi=None)
Same as clr() but for isometric logratio transform. An orthonormal basis can be provided as psi. If no basis is given, a default sequential binary partition basis will be used.
Parameters
- psi (array_like) - Orthonormal basis.
ILR transformation - standard deviation
df.coda.ilr_std(psi=None, n_samples=5000)
This method does not exist (yet).
Bayesian zero replacement
df.coda.zero_replacement(n_samples=5000)
Returns a count table with zero values replaced by finite values using Bayesian inference.
Parameters
- n_samples (int) - Number of random draws from a Dirichlet distribution.
Closure
df.coda.closure(N)
Apply closure to constant N to the composition.
Parameters
- N (int) - Closure constant.
Total variance
df.coda.totvar()
Calculates the total variance of a set of compositions.
Geometric mean
df.coda.gmean()
Calculates the geometric mean of a set of compositions.
Centering
df.coda.center()
Centers (and scales) the composition by dividing by the geometric mean and powering by the reciprocal variance.
Plotting functions
PCA biplot
class pycoda.pca.Biplot(data, default=True)
Plots a PCA biplot. Set default to False for an empty plot. The parameter data (DataFrame) is the data to be analyzed. Use counts, not CLR values.
A number of methods are available for customizing the biplot:
- plotloadings(cutoff=0, scale=None, labels=None)
- plotloadinglabels(labels=None)
- plotscores(group=None, palette=None, legend=True, labels=None)
- plotscorelables(labels=None)
- plotellipses(group=None, palette=None)
- plotcentroids(group=None, palette=None)
- plothulls(group=None, palette=None)
- plotcontours(group=None, palette=None, size=None, levels=None)
- removepatches()
- removescores()
- removelabels()
The keyword labels is a list of labelnames. If labels is None, all labels are plottet. Use labels=[] for no labels.
The keyword group is a Pandas dataframe with index equal to the index of data.
The keyword palette is a dict with colors to use to each unique member of group.
Example import pycoda as coda import pandas as pd
data = pd.read_csv('example/kilauea_iki_chem.csv')
mypca = coda.pca.Biplot(data)
mypca.plothulls()
mypca.removelabels()
mypca.plotloadinglabels(['FeO'])
Ternary diagram
pycoda.plot.ternary()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.