Skip to main content

RelBench: Relational Deep Learning Benchmark

Project description

logo


website PyPI version Testing Status License: MIT Twitter

Get Started: loading data   , training model   .

Website | Vision Paper | Benchmark Paper | Mailing List

Overview

Relational Deep Learning is a new approach for end-to-end representation learning on data spread across multiple tables, such as in a relational database (see our vision paper). Relational databases are the world's most widely used database management system, and are used for industrial and scientific purposes accross many domains. RelBench is a benchmark designed to facilitate efficient, robust and reproducible research in end-to-end deep learning on relational databases. RelBench contains 7 realistic, large-scale, and diverse relational databases spanning domains including medical, social networks, e-commerce and sport. Each database has multiple predictive tasks (29 in total) defined, each carefully scoped to be both challenging and of domain-specific importance. It provides full support for data downloading, task specification and standardized evaluation in an ML-framework-agnostic manner.

Additionally, RelBench provides a first open-source implementation of a Graph Neural Network based approach to relational deep learning. This implementation uses PyTorch Geometric to load the data as a graph and train GNN models, and PyTorch Frame to encode the various types of table columns. Finally, there is an open leaderboard for tracking progress.

Key Papers

RelBench Paper [RelBench: A Benchmark for Deep Learning on Relational Databases.]

This paper details our approach to designing the RelBench benchmark. It also includes a key user study showing that relational deep learning can produce performant models with a fraction of the manual human effort required by typical data science pipelines. This paper is useful for a detailed understanding of RelBench and our initial benchmarking results. If you just want to quickly familiarize with the data and tasks, the website is a better place to start.

Vision Paper [Relational Deep Learning: Graph Representation Learning on Relational Databases.]

This paper outlines our proposal for how to do end-to-end deep learning on relational databases by combining graph neural networsk with deep tabular models. We reccomend reading this paper if you want to think about new methods for end-to-end deep learning on relational databases. The paper includes a section on possible directions for future research to give a snapshot of some of the research possilibities there are in this area.

Design of RelBench

logo

RelBench has the following main components:

  1. 7 databases, each automatically downloadable for ease of use (with the exception of H&M, for which RelBench gives other instructions)
  2. Easy 1-line loading of data, including loading the raw tables, and also code for constructing a graph from pkey-fkey links
  3. Your own model, which can use any deep learning stack since RelBench is framework-agnostic. We provide a first model implementation using PyTorch Geometric and PyTorch Frame.
  4. Standardized evaluators - all you need to do is produce a list of predictions for test samples, and RelBench computes metrics to ensure standardized evaluation
  5. A leaderboard you can upload your results to, to track SOTA progress.

Installation

You can install RelBench using pip:

pip install relbench

This will allow usage of the RelBench data and task loading functionality. To additionally use the example GNN scripts in the examples directory, and the graph-related helper functions found in relbench/modeling it is also necessary to install PyTorch Geometric and PyTorch Frame. PyTorch Frame can simply be installed with

pip install pytorch_frame

and the PyTorch Geometric installation instructions can be found here. Note that as well as torch_geometric, you will also need to install the optional dependencies pyg_lib, torch_scatter, torch_sparse.

Package Usage

Here we describe key functions of RelBench. RelBench provides a collection of APIs for easy access to machine-learning-ready relational databases.

To see all available datasets:

from relbench.datasets import dataset_names
print(dataset_names)

For a concrete example, to obtain the rel-stack relational database, a database of questions and answers from Stack Exchange, do:

from relbench.datasets import get_dataset
dataset = get_dataset(name="rel-stack")

To see the tasks available for this dataset:

print(dataset.task_names)

Next, to retrieve the posts-votes predictive task, which is to predict the upvotes of a post it will receive in the next 2 years, simply do:

task = dataset.get_task("post-votes")
task.train_table, task.val_table, task.test_table # training/validation/testing tables

The training/validation/testing tables are automatically generated using pre-defined standardized temporal split. You can then build your favorite relational deep learning model on top of it. After training and validation, you can make prediction from your model on task.test_table. Suppose your prediction test_pred is an array following the order of task.test_table, you can call the following to retrieve the unified evaluation metrics:

task.evaluate(test_pred)

Additionally, you can evaluate validation (or training) predictions as such:

task.evaluate(val_pred, task.val_table)

Tutorials

To get started with RelBench, we provide some helpful Colab notebook tutorials. For now these tutorials cover (i) how to load data using RelBench, focusing on providing users with the understanding of RelBench data logic needed to use RelBench data freely with any desired ML models, and (ii) training a GNN predictive model to solve any tasks in RelBench.

Name Description
Loading Data   How to load and explore RelBench data
Training models   Train your first GNN-based model on RelBench.

Cite RelBench

If you use RelBench in your work, please cite our position paper and benchmark paper:

@article{relationaldeeplearning,
  title={Relational Deep Learning: Graph Representation Learning on Relational Tables},
  author={Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, Jure Leskovec},
  journal={ICML Position Paper}
  year={2024}
}
@article{relbench,
  title={RelBench: A Benchmark for Deep Learning on Relational Databases},
  author={Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan Eric Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relbench-1.0.0rc1.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

relbench-1.0.0rc1-py3-none-any.whl (57.4 kB view details)

Uploaded Python 3

File details

Details for the file relbench-1.0.0rc1.tar.gz.

File metadata

  • Download URL: relbench-1.0.0rc1.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for relbench-1.0.0rc1.tar.gz
Algorithm Hash digest
SHA256 dd7791d06a0a5b4b6332df6234f39f32b8989355f44a4c6a9716f664b00b8186
MD5 138aa941d17a4c2d07babec8c8aac6b4
BLAKE2b-256 f0302f7651b1455f07399b5f107832b2245ce452f86eda8e96908b9275a541f2

See more details on using hashes here.

File details

Details for the file relbench-1.0.0rc1-py3-none-any.whl.

File metadata

  • Download URL: relbench-1.0.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 57.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for relbench-1.0.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 6741f8759018f2da91b0e5307e4261c95e64b04acda6f2fafb6119031890e2a7
MD5 130364f92e3d89db1ced40a85099d360
BLAKE2b-256 68772741c460e9857fe4cf5f7b472e9b141ff07bb24f78c3e7715683cfa94677

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page