PySurvey is a Python package designed to perform interactive analysis of survey data, composed of counts of occurrence of different categories in a collection of samples. Specifically, PySurvey is developed in the context of genomic surveys, such as 16S surveys, where one studies the occurrence of OTUs across samples. Though much of PySurvey’s functionality is not unique to survey data, and equivalent features are implemented in many other packages, PySurvey is intended to serve as a ‘one-stop-shop’, and thus attempts to includes all the methods that are commonly used in the analysis of genomic survey data (often by wrapping around other packages), with a sensible choice of default parameters (e.g. distance metrics, etc’).
PySurvey is based on the powerful pandas package which offers rich data structures which are tailored and optimized for interactive analysis of large data tables.
- General utility:
- Metadata support.
- Filtering of samples/components.
- ML and Bayesian estimation of component fractions.
- Exploratory analysis:
- Dimension reduction: PCoA.
- Clustering: hierarchical, gaussian mixture models GMM.
- Compositional correlations via SparCC.
- Plotting: sorted heatmaps, stacked plots, …
- Ecological theory:
- Sample diversities (alpha diversity).
- Rank abundance plots.