Skip to main content

Python API to infer missing data in sparsely sampled genotype-phenotype maps.

Project description

GPSeer

Simple software for inferring missing data in sparsely measured genotype-phenotype maps

GPSeer tests

Basic Usage

Install gpseer using pip:

pip install gpseer

To use as a command line, call gpseer on an input .csv file containing genotype-phenotype data.

The API Demo.ipynb demonstrates how to use GPSeer in a Jupyter notebook.

Downloading the example

To get started, use GPSeer's fetch-example command to download an example from its Github repo.

Download the gpseer example and explore the example input data:

# fetch data from Github page.
> gpseer fetch-example

[GPSeer] Downloading files to /examples...
[GPSeer] └──>: 100%|██████████████████| 3/3 [00:00<00:00,  9.16it/s]
[GPSeer] └──> Done!

# Change into the example directory and checkout the files that were downloaded
> cd examples/
> ls

API Demo.ipynb
example-full.csv
example-test.csv
example-train.csv
Generate Dataset.ipynb
genotypes.txt
pfcrt-raw-data.csv

Predicting missing data using ML model.

Estimate the maximum likelihood additive model on the training set and predict all missing genotypes. The predictions will be written to a file named "example-train_predictions.csv".

> gpseer estimate-ml example-train.csv

[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Constructing a model...
[GPSeer] └──> Done constructing model.
[GPSeer] Fitting data...
[GPSeer] └──> Done fitting data.
[GPSeer] Predicting missing data...
[GPSeer] └──> Done predicting.
[GPSeer] Calculating fit statistics...
[GPSeer]

Fit statistics:
---------------

              parameter     value
0         num_genotypes       128
1  num_unique_mutations         8
2   explained_variation  0.985186
3        num_parameters         9
4   num_obs_to_converge   2.82714
5             threshold      None
6          spline_order      None
7     spline_smoothness      None
8       epistasis_order         1


[GPSeer]

Convergence:
------------

  mutation  num_obs  num_obs_above  fold_target  converged
0      F0K       64             64    22.637735       True
1      S1Y       69             69    24.406308       True
2      Q2T       63             63    22.284020       True
3      R3V       70             70    24.760023       True
4      N4D       62             62    21.930306       True
5      A5C       69             69    24.406308       True
6      C6D       65             65    22.991450       True
7      C7A       64             64    22.637735       True


[GPSeer] └──> Done.
[GPSeer] Writing phenotypes to example-train_predictions.csv...
[GPSeer] └──> Done writing predictions!
[GPSeer] Writing plots...
[GPSeer] Writing example-train_correlation-plot.pdf...
[GPSeer] Writing example-train_phenotype-histograms.pdf...
[GPSeer] └──> Done plotting!
[GPSeer] GPSeer finished!

Compute the predictive power of the model by cross-validation

Estimate how well your model is predicting data using the "cross-validate" subcommand. Try the example below where we generate 100 subsets from the data and compute your prediction scores.

> gpseer cross-fit example-test.csv

[GPSeer] Reading data from example-train.csv...
[GPSeer] └──> Done reading data.
[GPSeer] Fitting all data data...
[GPSeer] └──> Done fitting data.
[GPSeer] Sampling the data...
[GPSeer] └──>: 100%|████████████████████| 100/100 [00:03<00:00, 25.90it/s]
[GPSeer] └──> Done sampling data.
[GPSeer] Plotting example-train_cross-validation-plot.pdf...
[GPSeer] └──> Done writing data.
[GPSeer] Writing scores to example-train_cross-validation-scores.csv...
[GPSeer] └──> Done writing data

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for gpseer, version 0.3.3
Filename, size File type Python version Upload date Hashes
Filename, size gpseer-0.3.3.tar.gz (15.9 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page