A simple library to generate synthetic Knowledge Graphs using random triples.
Project description
IntelliGraphs is a Python package that generates a collection of benchmark datasets. These datasets are intended to be used for benchmarking machine learning models under transductive settings. It can also be used as a testbed for developing new generative models. This library was designed to be extendable to create new synthetic datasets with custom First-Order Logical (FOL) rules.
TODO
- Ask Paul: Do we want to register this with Zenodo. If so, add DOI badge here.
- Ask Paul: How to generate dataset metadata? Is it needed?
- Ask Paul: When to register dataset? Before or after publication?
- Ask Paul: Where to put the dataset so that it lasts? (Zenodo, GitHub, etc.)
- Ask Peter/Paul: Do we want to make it available on PyPI? If so, add badge here.
- Make GitHub repo anonymous before submission
Installation
To install IntelliGraphs locally, simply:
pip install intelligraphs
Advantages
- Easy to use: Generate and manipulate Knowledge Graphs with a simple and clean Python API.
- Flexible: Customize the number of graphs, triples, and data splits.
- Extendable: Create more graphs according to custom FOL rules.
- Efficient: Fast and memory-efficient graph generation and manipulation using native Python data structures.
- Visualization: Visualize Knowledge Graphs.
Usage
Here's a brief example of how to use various features of the IntelliGraphs library:
from intelligraphs import IntelliGraphs
# Create an instance of IntelliGraphs with 10 graphs, variable length triples, and a random seed of 42
intelligraph = IntelliGraphs(random_seed=42, num_graphs=10, var_length=True, min_triples=2, max_triples=5)
# Manually generate the graphs
intelligraph.generate_graphs()
# Get the list of graphs
graphs = intelligraph.get_graphs()
# Print the first graph
intelligraph.print_graph(graphs[0])
# Visualize the first graph
intelligraph.visualize_graph(graphs[0])
# Get the natural language sentences for the triples
all_sentences = intelligraph.to_natural_language()
# Print the sentences for each graph
for i, sentences in enumerate(all_sentences):
print(f"Graph {i + 1}:")
for sentence in sentences:
print(sentence)
print()
# Manually trigger splitting the data into train, valid, and test sets
intelligraph.split_data(split_ratio=(0.6, 0.3, 0.1))
# Get the data splits
splits = intelligraph.get_splits()
# Print the data splits
for split_name, data in splits.items():
print(f"{split_name.capitalize()} Data:")
for graph in data:
print(graph)
print()
# Save the graphs and splits to text files
intelligraph.save_graphs(filename='example', file_path='output', zip_compression=False)
intelligraph.save_splits(filename='example', file_path='output', zip_compression=False)
# Save the graphs and splits to zip compressed text files
intelligraph.save_graphs(filename='example', file_path='output', zip_compression=True)
intelligraph.save_splits(filename='example', file_path='output', zip_compression=True)
Datasets
Here is a description of the datasets:
Dataset | Rules | # Nodes | # Edges | # Relations | # Classes | # Train | # Valid | # Test |
---|---|---|---|---|---|---|---|---|
syn-paths | - | - | - | - | - | - | - | - |
syn-tipr | - | - | - | - | - | - | - | - |
syn-type | - | - | - | - | - | - | - | - |
syn-nl | - | - | - | - | - | - | - | - |
wd-movies | - | - | - | - | - | - | - | - |
wd-articles | - | - | - | - | - | - | - | - |
Example
Dataset | Knowledge Graph |
---|---|
syn-paths | |
syn-tipr | |
syn-types | |
wd-movies | |
wd-articles |
First-Order Logic
First-order logic (FOL) is a logic system that is used to describe the world around us. It is a formal language that allows us to make statements about the world.
Statements in FOL are made up of two parts: the subject and the predicate. The subject is the thing that is being described, and the predicate is the property of the subject. For example, the statement "John is a student" has the subject "John" and the predicate "is a student".
FOL statements can be expressed in a text file. For examples, the FOL statements for the syn-paths can be expressed as:
Connected(x) → (Subject(x, y) ∨ Object(x, z))
∀x ∀y (¬Root(x) ∨ ¬Object(y, x))
∀x ∀y (¬Leaf(x) ∨ ¬Subject(y, x))
This can be parsed by the IntelliGraphs library using:
intelligraph.parse_fol_rules('path/to/rules.txt')
Future Work
Inductive Setting It would be very useful doing the data split such that it allows for inductive setting.
How to Cite
If you use IntelliGraphs in your research, please cite the following paper:
License
IntelliGraphs is licensed under MIT License. See LICENSE for more information.
Copyright (c) 2023 Thiviyan Thanapalasingam
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file intelligraphs-0.1.1.tar.gz
.
File metadata
- Download URL: intelligraphs-0.1.1.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bca68bac9d39b080d653c8f4e2c4f55db6cb3d41063308172afca97d05821fb0 |
|
MD5 | fb679aa1a270fbf98835091fea4b6c73 |
|
BLAKE2b-256 | d5803f81dc783e92a84801827d5f591bf1974bce1a4f48f350c1eb227a5e2f18 |
File details
Details for the file intelligraphs-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: intelligraphs-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ed10a309be459450db868379510b30a31ab200ce505bd062488d940c52ef36e |
|
MD5 | 21497e26ed4c05e7df8827d14b12febb |
|
BLAKE2b-256 | ad72a3765285a371037b22187a8ef455409ac5df0e52af24a8a677f75b429117 |