Skip to main content

A library for Knowledge Graph Completion using Wasserstein GANs

Project description

Wasserstein GAN for Knowledge Graph Completion

PyPI version Sync Results pages-build-deployment

semantic-gan is a Python implementation of a Wasserstein GAN architecture for knowledge graph completion.

Installation

The package can be installed from PyPI:

pip install semantic-gan

Or install from source:

git clone https://github.com/erdemonal/SemanticGAN.git
cd SemanticGAN
pip install -e .

Usage

The following example demonstrates usage with a generic knowledge graph dataset:

from semanticgan import KnowledgeGraphDataset, Generator, Discriminator
import torch
from torch.utils.data import DataLoader

# 1. Load a generic knowledge graph dataset
# Format: head_id [tab] relation_id [tab] tail_id
dataset = KnowledgeGraphDataset(
    triples_path="my_custom_data.txt", 
    sep='\t', 
    names=['h', 'r', 't']
)

# 2. Initialize Models
G = Generator(
    embedding_dim=256, 
    hidden_dim=512, 
    num_relations=dataset.num_relations
)
D = Discriminator(
    num_entities=dataset.num_entities,
    num_relations=dataset.num_relations,
    embedding_dim=256,
    hidden_dim=512
)

# 3. Create data loader and train
dataloader = DataLoader(dataset, batch_size=1024, shuffle=True)

Technical Report: DBLP Case Study

This repository accompanies a technical report entitled "Knowledge Graph Completion and RDF Triple Generation with a Wasserstein GAN", presenting an experimental study on the DBLP Computer Science Bibliography.

Technical Report

A detailed description of the model architecture, training procedure, and evaluation protocol is provided in the technical report:

paper/knowledge-graph-completion-wasserstein-gan.pdf

The LaTeX source is available in paper/main.tex

Results

Training artifacts and generated RDF triples are available at: https://erdemonal.github.io/SemanticGAN

Methodology

The preprocessing pipeline parses the DBLP XML dump from https://dblp.uni-trier.de/xml to extract a knowledge graph with entity types Publication, Author, Venue, and Year. Relations include dblp:wrote, dblp:hasAuthor, dblp:publishedIn, and dblp:inYear.

The preprocessing script scripts/prepare_dblp_kg.py reads the XML file incrementally and produces RDF triples in tab separated format. The preprocessed 1M triple dataset is versioned and maintained in the Hugging Face Dataset Hub.

The WGAN model consists of a Generator that produces tail entity embeddings from noise and relation embeddings, and a Discriminator that scores triples using a scalar Wasserstein distance. Training uses RMSprop with gradient clipping to enforce the Lipschitz constraint.

Training and synchronization are automated via a continuous integration workflow. Training is executed on external compute infrastructure, and the resulting artifacts are synchronized after each run.

Model Storage and Data Decoupling

Model weights and processed knowledge graph artifacts are hosted on the Hugging Face Hub across two repositories:

Model Hub: erdemonal/SemanticGAN stores the persistent WGAN checkpoints.

Dataset Hub: erdemonal/SemanticGAN-Dataset contains the processed DBLP triples and ID mappings.

The automated training workflow fetches processed data from the Dataset Hub and restores model states from the Model Hub before each training run.

Data Availability

The DBLP dataset is publicly available from https://dblp.uni-trier.de/xml

Documentation is available at https://dblp.org/xml/docu/dblpxml.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_gan-0.1.1.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_gan-0.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file semantic_gan-0.1.1.tar.gz.

File metadata

  • Download URL: semantic_gan-0.1.1.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for semantic_gan-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a8a4c9dd8e8008ae040dacbb01e5848f0c3b4bbc30da46417ef5547aad997a6b
MD5 8075e2cd21c6d11aee6ea28103613a5b
BLAKE2b-256 874239d590fb4690a2713ea654d7449a10f7691ecd11d8bcfb032f15d97eb182

See more details on using hashes here.

File details

Details for the file semantic_gan-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: semantic_gan-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for semantic_gan-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b78528bfe0e227ed3443108e9d2ee3add4bbc5e95c979e885f8ee499d62f7c6f
MD5 13cb4c8b8005e14f1e46d60ee0fafe98
BLAKE2b-256 6afc878c349dd2d10c137d5db4471fc063c2b707605207dc57f8915f4b99646f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page