Skip to main content

A simple library to generate synthetic Knowledge Graphs using random triples.

Project description


PyPI Latest Release License: MIT

IntelliGraphs is a Python package that generates a collection of benchmark datasets. These datasets are intended to be used for benchmarking machine learning models under transductive settings. It can also be used as a testbed for developing new generative models. This library was designed to be extendable to create new synthetic datasets with custom First-Order Logical (FOL) rules.

TODO

  • Ask Paul: Do we want to register this with Zenodo. If so, add DOI badge here.
  • Ask Paul: How to generate dataset metadata? Is it needed?
  • Ask Paul: When to register dataset? Before or after publication?
  • Ask Paul: Where to put the dataset so that it lasts? (Zenodo, GitHub, etc.)
  • Ask Peter/Paul: Do we want to make it available on PyPI? If so, add badge here.
  • Make GitHub repo anonymous before submission

Installation

To install IntelliGraphs locally, simply:

pip install intelligraphs

Advantages

  • Easy to use: Generate and manipulate Knowledge Graphs with a simple and clean Python API.
  • Flexible: Customize the number of graphs, triples, and data splits.
  • Extendable: Create more graphs according to custom FOL rules.
  • Efficient: Fast and memory-efficient graph generation and manipulation using native Python data structures.
  • Visualization: Visualize Knowledge Graphs.

Usage

Here's a brief example of how to use various features of the IntelliGraphs library:

from intelligraphs import IntelliGraphs

# Create an instance of IntelliGraphs with 10 graphs, variable length triples, and a random seed of 42
intelligraph = IntelliGraphs(random_seed=42, num_graphs=10, var_length=True, min_triples=2, max_triples=5)

# Manually generate the graphs
intelligraph.generate_graphs()

# Get the list of graphs
graphs = intelligraph.get_graphs()

# Print the first graph
intelligraph.print_graph(graphs[0])

# Visualize the first graph
intelligraph.visualize_graph(graphs[0])

# Get the natural language sentences for the triples
all_sentences = intelligraph.to_natural_language()

# Print the sentences for each graph
for i, sentences in enumerate(all_sentences):
    print(f"Graph {i + 1}:")
    for sentence in sentences:
        print(sentence)
    print()

# Manually trigger splitting the data into train, valid, and test sets
intelligraph.split_data(split_ratio=(0.6, 0.3, 0.1))

# Get the data splits
splits = intelligraph.get_splits()

# Print the data splits
for split_name, data in splits.items():
    print(f"{split_name.capitalize()} Data:")
    for graph in data:
        print(graph)
    print()

# Save the graphs and splits to text files
intelligraph.save_graphs(filename='example', file_path='output', zip_compression=False)
intelligraph.save_splits(filename='example', file_path='output', zip_compression=False)

# Save the graphs and splits to zip compressed text files
intelligraph.save_graphs(filename='example', file_path='output', zip_compression=True)
intelligraph.save_splits(filename='example', file_path='output', zip_compression=True)

Datasets

Here is a description of the datasets:

Dataset Rules # Nodes # Edges # Relations # Classes # Train # Valid # Test
syn-paths - - - - - - - -
syn-tipr - - - - - - - -
syn-type - - - - - - - -
syn-nl - - - - - - - -
wd-movies - - - - - - - -
wd-articles - - - - - - - -

Example

Dataset Knowledge Graph
syn-paths
    
syn-tipr
    
syn-types
    
wd-movies
    
wd-articles
    

First-Order Logic

First-order logic (FOL) is a logic system that is used to describe the world around us. It is a formal language that allows us to make statements about the world.

Statements in FOL are made up of two parts: the subject and the predicate. The subject is the thing that is being described, and the predicate is the property of the subject. For example, the statement "John is a student" has the subject "John" and the predicate "is a student".

FOL statements can be expressed in a text file. For examples, the FOL statements for the syn-paths can be expressed as:

Connected(x) → (Subject(x, y) ∨ Object(x, z))
∀x ∀y (¬Root(x) ∨ ¬Object(y, x))
∀x ∀y (¬Leaf(x) ∨ ¬Subject(y, x))

This can be parsed by the IntelliGraphs library using:

intelligraph.parse_fol_rules('path/to/rules.txt')

Future Work

Inductive Setting It would be very useful doing the data split such that it allows for inductive setting.

How to Cite

If you use IntelliGraphs in your research, please cite the following paper:


License

IntelliGraphs is licensed under MIT License. See LICENSE for more information.

Copyright (c) 2023 Thiviyan Thanapalasingam

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intelligraphs-0.1.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

intelligraphs-0.1.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file intelligraphs-0.1.1.tar.gz.

File metadata

  • Download URL: intelligraphs-0.1.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for intelligraphs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bca68bac9d39b080d653c8f4e2c4f55db6cb3d41063308172afca97d05821fb0
MD5 fb679aa1a270fbf98835091fea4b6c73
BLAKE2b-256 d5803f81dc783e92a84801827d5f591bf1974bce1a4f48f350c1eb227a5e2f18

See more details on using hashes here.

File details

Details for the file intelligraphs-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for intelligraphs-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ed10a309be459450db868379510b30a31ab200ce505bd062488d940c52ef36e
MD5 21497e26ed4c05e7df8827d14b12febb
BLAKE2b-256 ad72a3765285a371037b22187a8ef455409ac5df0e52af24a8a677f75b429117

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page