Skip to main content

A library for Knowledge Graph Completion using Wasserstein GANs

Project description

Wasserstein GAN for Knowledge Graph Completion

Sync Results pages-build-deployment

This repository contains an experimental research system for knowledge graph completion on the DBLP Computer Science Bibliography using Wasserstein GANs.

The system uses a Wasserstein GAN to generate candidate RDF triples from an evolving publication graph. Model training is executed periodically using an automated workflow.

Technical Report

A detailed description of the model, training procedure, and evaluation is provided in the technical report:

paper/knowledge-graph-completion-wasserstein-gan.pdf

The LaTeX source is available in paper/main.tex

Results

Training outputs and generated RDF triples are available at: https://erdemonal.github.io/SemanticGAN

Methodology

The system processes the DBLP XML dump from https://dblp.uni-trier.de/xml to extract a knowledge graph with entity types Publication, Author, Venue, and Year. Relations include dblp:wrote, dblp:hasAuthor, dblp:publishedIn, and dblp:inYear.

The preprocessing script scripts/prepare_dblp_kg.py reads the XML file incrementally and produces RDF triples in tab separated format. The preprocessed 1M triple dataset is versioned and maintained in the Hugging Face Dataset Hub.

The WGAN model consists of a Generator that produces tail entity embeddings from noise and relation embeddings, and a Discriminator that scores triples using a scalar Wasserstein distance. Training uses RMSprop with gradient clipping to enforce the Lipschitz constraint.

An automated training workflow is orchestrated via GitHub Actions. Training is executed on external compute infrastructure, and the resulting artifacts are synchronized after each run.

Model Storage and Data Decoupling

Model weights and processed knowledge graph artifacts are hosted on the Hugging Face Hub across two repositories:

Model Hub: erdemonal/SemanticGAN stores the persistent WGAN checkpoints.

Dataset Hub: erdemonal/SemanticGAN-Dataset contains the processed DBLP triples and ID mappings.

The automated training workflow fetches processed data from the Dataset Hub and restores model states from the Model Hub before each training run.

Data Availability

The DBLP dataset is publicly available from https://dblp.uni-trier.de/xml

Documentation is available at https://dblp.org/xml/docu/dblpxml.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_gan-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_gan-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file semantic_gan-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_gan-0.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for semantic_gan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2cee1e5a29022fc4071ec599a2ecc4346b23e0f97c651a533eab074d84dfffa2
MD5 eff8fd670b0ecab20afba3a09883404d
BLAKE2b-256 27799cbfc175f9fc715443cae0abe6372cd685b75ad80190d44883916907dd32

See more details on using hashes here.

File details

Details for the file semantic_gan-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: semantic_gan-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for semantic_gan-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88fca492d66b6e9cf7d7ad2ccbecbccf680241301cd0cb598700580585a30ee7
MD5 72234ce07eb9180f227d26e80109403c
BLAKE2b-256 63e0979d45e7fff0dcbd2a98c072d4a37f08ad27d22c889f6ec76d63c877f362

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page