Skip to main content

Generating dense embeddings for proteins using kernel PCA

Project description

This tool generates low-dimensional, continuous, distributed vector representations for non-numeric entities such as text or biological sequences (e.g. DNA or proteins) via kernel PCA with rational kernels.

The current implementation accepts any input dataset that can be read as a list of strings.

Installation Current version of RatVec on PyPI Python versions supported by RatVec RatVec is distributed under the Apache 2.0 License

RatVec can be installed on Python 3.6+ from PyPI with the following code in your favorite terminal:

$ pip install ratvec

or from the latest code on GitHub with:

$ pip install git+https://github.com/ratvec/ratvec.git

It can be installed in development mode with:

$ git clone https://github.com/ratvec/ratvec.git
$ cd ratvec
$ pip install -e .

The -e dynamically links the code in the git repository to the Python site-packages so your changes get reflected immediately.

How to Use

ratvec automatically installs a command line interface. Check it out with:

$ ratvec --help

RatVec has three main commands: generate, train, and evaluate:

  1. Generate. Downloads and prepare the SwissProt data set that is showcased in the RatVec paper.

$ ratvec generate
  1. Train. Compute KPCA embeddings on a given data set. Please run the following command to see the arguments:

$ ratvec train --help
  1. Evaluate. Evaluate and optimize KPCA embeddings. Please run the following command to see the arguments:

$ ratvec evaluate --help

Showcase Dataset

The application presented in the paper (SwissProt dataset [1] used by Boutet et al. [2]) can be downloaded directly from here or running the following command:

$ ratvec generate

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ratvec-0.1.2.tar.gz (22.6 kB view hashes)

Uploaded Source

Built Distribution

ratvec-0.1.2-py3-none-any.whl (24.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page