Skip to main content

Generating dense embeddings for proteins using kernel PCA

Project description

This tool generates low-dimensional, continuous, distributed vector representations for non-numeric entities such as text or biological sequences (e.g. DNA or proteins) via kernel PCA with rational kernels.

The current implementation accepts any input dataset that can be read as a list of strings.

Installation

Install directly from the source with:

$ pip install git+https://github.com/ratvec/ratvec.git

Install in development mode with:

$ git clone https://github.com/ratvec/ratvec.git
$ cd ratvec
$ pip install -e .

The -e dynamically links the code in the git repository to the Python site-packages so your changes get reflected immediately.

How to Use

ratvec automatically installs a command line interface. Check it out with:

$ ratvec --help

RatVec has three main commands: generate, train, and evaluate:

  1. Generate. Downloads and prepare the SwissProt data set that is showcased in the RatVec paper.

$ ratvec generate
  1. Train. Compute KPCA embeddings on a given data set. Please run the following command to see the arguments:

$ ratvec train --help
  1. Evaluate. Evaluate and optimize KPCA embeddings. Please run the following command to see the arguments:

$ ratvec evaluate --help

Showcase Dataset

The application presented in the paper (SwissProt dataset [1] used by Boutet et al. [2]) can be downloaded directly from here or running the following command:

$ ratvec generate

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ratvec-0.1.1.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

ratvec-0.1.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file ratvec-0.1.1.tar.gz.

File metadata

  • Download URL: ratvec-0.1.1.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for ratvec-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4602778525c84079221ef87b1080b5efb051877dcc9998ff11490bc9904f5773
MD5 538198c62acdf628f15aeb4f515f5192
BLAKE2b-256 b5356aece06611f0530799d7cc75590532c7ec67f1df01734772821afd8a1a47

See more details on using hashes here.

File details

Details for the file ratvec-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ratvec-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for ratvec-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 764a9aa7b8be54d4354e8717e19f5a068fec3916ec07fbd7cb256c0f33445dd4
MD5 421f22819e83801996dcf104587e3782
BLAKE2b-256 b915deb5965c559f133f30c0fa21a5de32d60944a6ae513c5350df7379a205cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page