Skip to main content

A package for storing and querying knowledge graph embeddings

Project description

This package provides a database schema and Python wrapper for storing the embeddings generated through various representation learning packages.

Currently, this package focuses on using a SQL database with SQLAlchemy, but might be extended to use a NoSQL database as an alternative.

Installation

Install embeddingdb directly from GitHub with:

$ pip install git+https://github.com/cthoyt/embeddingdb

Set the environment variable EMBEDDINGDB_CONNECTION to a valid SQLAlchemy connection string for a PostgreSQL instance, as this package uses the PostgreSQL-specific ARRAY type.

Command Line Interface

This package installs an entrypoint embeddingdb that can be used directly from the shell.

Uploading Entity Embeddings

Entities can be embedded and stored from various types of representation learning, including network representation learning, knowledge graph embedding, and textual learning.

Upload embeddings generated by word2vec by specifying the file path with:

$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt

Upload embeddings generated by pykeen by specifying the output directory with:

$ embeddingdb upload --fmt keen --path ~/path/to/directory/

Listing Entity Embeddings

After uploading, the collections can be listed with:

$ embeddingdb ls

Analyzing Entity Embeddings’ Correlations

One of the motivations for building this repository was to make a convenient way to compare the embeddings for entities generated through orthogonal embedding tecnhiques. For example, we wanted to know to what extent the embeddings for proteins generated from their sequences with ratvec contained the same information as the embeddings generated from protein-protein interaction networks with pykeen or nrl.

The two positional arguments correspond to the collection identifiers in the database.

$ embeddingdb analyze 1 2

Running with Docker

After installing Docker, the entire web application can be instantiated with:

$ docker-compose up

Get the endpoint /test to instantiate the database and add a test collection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingdb-0.0.1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

embeddingdb-0.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file embeddingdb-0.0.1.tar.gz.

File metadata

  • Download URL: embeddingdb-0.0.1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for embeddingdb-0.0.1.tar.gz
Algorithm Hash digest
SHA256 56bfa60b2b7907b3db5646c79f37719cff99be92f5f2bbc8c49834c97af051b3
MD5 ad4482ecd797c1e111eb1b9554e37cce
BLAKE2b-256 efdbddf211ef47ead4b2002b37cb10b75b2f01c07255b79e21386203d8264911

See more details on using hashes here.

File details

Details for the file embeddingdb-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: embeddingdb-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for embeddingdb-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 42d77278149dabad4c5771c4764654e3a61d28e770da6da85ff769d9cfffc6bc
MD5 213e812a9081410e1c49449ddd816bee
BLAKE2b-256 265ae438bd17b1d42760824f5b3aab6b53457be6de25607824c4dfda2a2b44d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page