A package for storing and querying knowledge graph embeddings
Project description
This package provides a database schema and Python wrapper for storing the embeddings generated through various representation learning packages.
Currently, this package focuses on using a SQL database with SQLAlchemy, but might be extended to use a NoSQL database as an alternative.
Installation
Install embeddingdb directly from GitHub with:
$ pip install git+https://github.com/cthoyt/embeddingdb
Set the environment variable EMBEDDINGDB_CONNECTION to a valid SQLAlchemy connection string for a PostgreSQL instance, as this package uses the PostgreSQL-specific ARRAY type.
Command Line Interface
This package installs an entrypoint embeddingdb that can be used directly from the shell.
Uploading Entity Embeddings
Entities can be embedded and stored from various types of representation learning, including network representation learning, knowledge graph embedding, and textual learning.
Upload embeddings generated by word2vec by specifying the file path with:
$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt
Upload embeddings generated by pykeen by specifying the output directory with:
$ embeddingdb upload --fmt keen --path ~/path/to/directory/
Listing Entity Embeddings
After uploading, the collections can be listed with:
$ embeddingdb ls
Analyzing Entity Embeddings’ Correlations
One of the motivations for building this repository was to make a convenient way to compare the embeddings for entities generated through orthogonal embedding tecnhiques. For example, we wanted to know to what extent the embeddings for proteins generated from their sequences with ratvec contained the same information as the embeddings generated from protein-protein interaction networks with pykeen or nrl.
The two positional arguments correspond to the collection identifiers in the database.
$ embeddingdb analyze 1 2
Running with Docker
After installing Docker, the entire web application can be instantiated with:
$ docker-compose up
Get the endpoint /test to instantiate the database and add a test collection.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file embeddingdb-0.0.1.tar.gz
.
File metadata
- Download URL: embeddingdb-0.0.1.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56bfa60b2b7907b3db5646c79f37719cff99be92f5f2bbc8c49834c97af051b3 |
|
MD5 | ad4482ecd797c1e111eb1b9554e37cce |
|
BLAKE2b-256 | efdbddf211ef47ead4b2002b37cb10b75b2f01c07255b79e21386203d8264911 |
File details
Details for the file embeddingdb-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: embeddingdb-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42d77278149dabad4c5771c4764654e3a61d28e770da6da85ff769d9cfffc6bc |
|
MD5 | 213e812a9081410e1c49449ddd816bee |
|
BLAKE2b-256 | 265ae438bd17b1d42760824f5b3aab6b53457be6de25607824c4dfda2a2b44d0 |