Skip to main content

A python library for Knowledge Graph Embedding

Project description

Python Knowledge Graph Embedding Library

This library is an outcome of a bold and optimistic attempt to bring all the state-of-the-art knowledge graph embedding algorithms into one single python library.

Implemented Methods

We aim to implement all the latest state-of-the-art knowledge graph embedding library. So far these are the implemented algorithms:

  • TransE: TransE is an energy based model which represents the relationships as translations in the embedding space. Which means that if (h,l,t) holds then the embedding of the tail 't' should be close to the embedding of head entity 'h' plus some vector that depends on the relationship 'l'. Both entities and relations are vectors in the same space. [1]

Datasets

We intend to provide the libraries to test the knowledge graph algorithms against all the well-known datasets available online. So far the library is able to work with the following datasets:

  • Freebase: Freebase is a large collaborative knowledge base consisting of data composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions [2].

Repository Structure

  • pyKG2Vec/config: This folder consists of the configuration module. It provides the necessary configuration to parse the datasets, and also consists of the baseline hyperparameters for the knowledge graph embedding algorithms.
  • pyKG2Vec/core: This folder consists of the core codes of the knowledge graph embedding algorithms. Inside this folder, each algorithm is implemented as a separate python module.
  • pyKG2Vec/utils: This folders consists of modules providing various utilities, such as data preparation, data visualization, and evaluation of the algorithms.

Dependencies

The goal of this library is to minimize the dependency on other libraries as far as possible to rapidly test the algorithms against different dataset. We emphasize that in the beginning, we will not be focus in run-time performance. However, in the future, may provide faster implementation of each of the algorithms.

  • h5py==2.9.0
  • Keras-Applications==1.0.7
  • Keras-Preprocessing==1.0.9
  • matplotlib==3.0.3
  • networkx==2.2
  • numpy==1.16.2
  • pandas==0.24.2
  • progressbar2==3.39.2
  • protobuf==3.7.0
  • requests==2.21.0
  • requests-toolbelt==0.9.1
  • scikit-learn==0.20.3
  • scipy==1.2.1
  • seaborn==0.9.0
  • six==1.12.0
  • sklearn==0.0
  • tensorboard==1.12.2
  • tensorflow-gpu==1.12.0
  • tqdm==4.31.1
  • urllib3==1.24.1

Install

For best performance, we encourage the users to create a virtual environment and setup the necessary dependencies for running the algorithms.

Prepare your environment:

```bash
   sudo apt update
   sudo apt install python3-dev python3-pip
   sudo pip3 install -U virtualenv     
```

Create a virtual environment:

```bash
   virtualenv --system-site-packages -p python3 ./venv
```

Activate the virtual environment using a shell-specific command:

```bash
   source ./venv/bin/activate
``` 

Upgrade pip:

```bash
   pip install --upgrade pip
```

Install pyKG2Vec:

```bash
   (venv) $ pip install --upgrade tensorflow
``` 

Usage Example

```python
   #Import the configuration module
   import pyKG2Vec as pkv

   #provide the configuration
   config = pkv.config.config.TransEConfig(learning_rate = 0.001,
					  batch_size    = 128,
					  epochs    = 100,
					  test_step = 10,
					  test_num  = 100)
    model = pkv.core.TransE(config=config)
    model.summary()					  					  

    evaluate = EvaluationTransE(model, 'test')
    loss, op_train, loss_every, norm_entity =  model.train()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        norm_rel = sess.run(tf.nn.l2_normalize(model.rel_embeddings, axis=1))
        sess.run(tf.assign(model.rel_embeddings, norm_rel))

        norm_ent = sess.run(tf.nn.l2_normalize(model.ent_embeddings, axis=1))
        sess.run(tf.assign(model.ent_embeddings, norm_ent))

        gen_train = model.data_handler.batch_generator_train(batch=model.config.batch_size)

        if model.config.loadFromData:
            saver = tf.train.Saver()
            saver.restore(sess, '../intermediate/TransEModel.vec')

        if not model.config.testFlag:

            for n_iter in range(model.config.epochs):
                acc_loss = 0
                batch    = 0
                num_batch = len(model.data_handler.train_triples_ids) // model.config.batch_size
                start_time = timeit.default_timer()

                for i in range(num_batch):
                    ph, pt, pr, nh, nt, nr = list(next(gen_train))

                    feed_dict = {
                        model.pos_h: ph,
                        model.pos_t: pt,
                        model.pos_r: pr,
                        model.neg_h: nh,
                        model.neg_t: nt,
                        model.neg_r: nr
                    }

                    l_val,  _,l_every, n_entity= sess.run([loss,op_train,loss_every, norm_entity],
                                                          feed_dict)

                    acc_loss += l_val
                    batch +=1
                    print('[%.2f sec](%d/%d): -- loss: %.5f' % (timeit.default_timer() - start_time,
                                                                batch,
                                                                num_batch,
                                                                l_val), end='\r')
                print('iter[%d] ---Train Loss: %.5f ---time: %.2f' % (
                    n_iter, acc_loss, timeit.default_timer() - start_time))

                if n_iter % model.config.test_step == 0 or n_iter == 0 or n_iter == model.config.epochs - 1:
                    evaluate.test(sess, n_iter)
                    evaluate.print_test_summary(n_iter)

        model.save_model(sess)
        model.summary()

        triples = model.data_handler.validation_triples_ids[:model.config.disp_triple_num]
        model.display(triples, sess)
```  

The output of code will be as follows:

```angular2
    Number of batches: 461
    iter[0] ---Train Loss: 53589.23212 ---time: 3.06
    ---------------Test Results: iter0------------------
    iter:0 --mean rank: 11484.90 --hit@10: 0.10
    iter:0 --filter mean rank: 11484.90 --filter hit@10: 0.10
    iter:0 --norm mean rank: 11413.10 --norm hit@10: 0.10
    iter:0 --norm filter mean rank: 11413.10 --norm filter hit@10: 0.10
    -----------------------------------------------------
    iter[1] ---Train Loss: 45554.93086 ---time: 2.91
    iter[2] ---Train Loss: 42788.54883 ---time: 2.80
    iter[3] ---Train Loss: 40905.94272 ---time: 2.76
    iter[4] ---Train Loss: 39643.85038 ---time: 2.68
    iter[5] ---Train Loss: 39153.04682 ---time: 2.87
    iter:5 --mean rank: 19915.50 --hit@10: 0.10
    iter:5 --filter mean rank: 19915.50 --filter hit@10: 0.10
    iter:5 --norm mean rank: 19856.20 --norm hit@10: 0.20
    iter:5 --norm filter mean rank: 19856.20 --norm filter hit@10: 0.20
    iter[6] ---Train Loss: 38518.77916 ---time: 2.66
    iter[7] ---Train Loss: 38320.69923 ---time: 2.60
    iter[8] ---Train Loss: 37865.60836 ---time: 2.58
    iter[9] ---Train Loss: 37619.98050 ---time: 2.51
    iter:9 --mean rank: 27790.10 --hit@10: 0.10
    iter:9 --filter mean rank: 27790.10 --filter hit@10: 0.10
    iter:9 --norm mean rank: 28324.30 --norm hit@10: 0.20
    iter:9 --norm filter mean rank: 28324.30 --norm filter hit@10: 0.20

    ----------SUMMARY----------
           margin : 1.0
           epochs : 10
     loadFromData : False
    disp_triple_num : 5
         test_num : 5
         testFlag : False
        test_step : 5
        optimizer : gradient
          L1_flag : True
       batch_size : 128
    learning_rate : 0.01
             data : Freebase
      hidden_size : 100
    ---------------------------
         reducing dimension to 2 using TSNE!
    dimension self.h_emb (5, 100)
    dimension self.r_emb (5, 100)
    dimension self.t_emb (5, 100)
    dimension self.h_emb (5, 2)
    dimension self.r_emb (5, 2)
    dimension self.t_emb (5, 2)
         drawing figure!
```

The red nodes represent head entity, green nodes represent the relations and the blue node represents the tail entities.

Cite

[1] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems.

@inproceedings{bordes2013translating,
title={Translating embeddings for modeling multi-relational data},
author={Bordes, Antoine and Usunier, Nicolas and Garcia-Duran, Alberto and Weston, Jason and Yakhnenko, Oksana},
booktitle={Advances in neural information processing systems},
pages={2787--2795},
year={2013}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykg2vec-0.0.3.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pykg2vec-0.0.3-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file pykg2vec-0.0.3.tar.gz.

File metadata

  • Download URL: pykg2vec-0.0.3.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for pykg2vec-0.0.3.tar.gz
Algorithm Hash digest
SHA256 355cba8f65ad54d72a037b028361034627ad58128d0837f44c329f69619464df
MD5 7417e4902ecfb4915b0b4e6db1f47ee9
BLAKE2b-256 58b0b123640d9d696a56090d854da6bf554bfb919ae05e5d0009ae9f9508626b

See more details on using hashes here.

File details

Details for the file pykg2vec-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pykg2vec-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for pykg2vec-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 573906f0ff7b72dbf861d2f738d42873ce9df6d3e705db4829420fcc683f9ca4
MD5 559bc9a56b908080ccdfa1aac0cf189c
BLAKE2b-256 8355bb8343a42d7707ec18b953f8392775fcd116f8c377860779ab29d2dd211c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page