A Python API for estimating statistical high-order epistasis in genotype-phenotype maps.
Project description
Epistasis
Python API for estimating statistical, high-order epistasis in genotype-phenotype maps.
All models follow a Scikit-learn interface and thus seamlessly plug in to the PyData ecosystem. For more information about the type of models included in this package, read our docs. You can also read more about the theory behind these models in our paper.
Finally, if you'd like to test out this package without any installing, try these Jupyter notebooks here (thank you Binder!).
Examples
The Epistasis package works best in combinations with GPMap, an API for managing genotype-phenotype map data. Construct a GenotypePhenotypeMap object and pass it directly to an epistasis model.
# Import a model and the plotting module
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisLinearRegression
from epistasis.pyplot import plot_coefs
# Genotype-phenotype map data.
wildtype = "AAA"
genotypes = ["ATT", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"]
phenotypes = [0.1, 0.2, 0.4, 0.3, 0.3, 0.6, 0.8, 1.0]
# Create genotype-phenotype map object.
gpm = GenotypePhenotypeMap(wildtype=wildtype,
genotypes=genotypes,
phenotypes=phenotypes)
# Initialize an epistasis model.
model = EpistasisLinearRegression(order=3)
# Add the genotype phenotype map.
model.add_gpm(gpm)
# Fit model to given genotype-phenotype map.
model.fit()
# Plot coefficients (powered by matplotlib).
plot_coefs(model, figsize=(3,5))
More examples can be found in these binder notebooks.
Installation
Epistasis works in Python 3+ (we do not guarantee it will work in Python 2.)
To install the most recent release on PyPi:
pip install epistasis
To install from source, clone this repo and run:
pip install -e .
Documentation
Documentation and API reference can be viewed here.
Dependencies
- gpmap: Module for constructing powerful genotype-phenotype map python data-structures.
- Scikit-learn: Simple to use machine-learning algorithms
- Numpy: Python's array manipulation packaged
- Scipy: Efficient scientific array manipulations and fitting.
- lmfit: Non-linear least-squares minimization and curve fitting in Python.
Optional dependencies
- matplotlib: Python plotting API.
- ipython: interactive python kernel.
- jupyter notebook: interactive notebook application for running python kernels interactively.
- ipywidgets: interactive widgets in python.
Development
We welcome pull requests! If you find a bug, we'd love to have you fix it. If there is a feature you'd like to add, feel free to submit a pull request with a description of the addition. We also ask that you write the appropriate unit-tests for the new feature and add documentation to our Sphinx docs.
To run the tests on this package, make sure you have pytest
installed and run from the base directory:
pytest
Citing
If you use this API for research, please cite this paper.
You can also cite the software directly:
@misc{zachary_sailer_2017_252927,
author = {Zachary Sailer and Mike Harms},
title = {harmslab/epistasis: Genetics paper release},
month = jan,
year = 2017,
doi = {10.5281/zenodo.1215853},
url = {https://doi.org/10.5281/zenodo.1215853}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for epistasis-0.7.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fccd75d1b9c5d5f61fa3e2464de262a2f235f1e4249cf3ae1659e99a0f38847 |
|
MD5 | d26ea7fc4812e645b0f95191dff3cb44 |
|
BLAKE2b-256 | e9a5df6b65456efc8d63f76f26d226df9f1172ededcaf258a61137be841ed9bf |