syntheseus-graphium

Fork of the graphium library

These details have not been verified by PyPI

Project links

Repository

Project description

Scaling molecular GNNs to infinity

A deep learning library focused on graph representation learning for real-world chemical tasks.

✅ State-of-the-art GNN architectures.
🐍 Extensible API: build your own GNN model and train it with ease.
⚗️ Rich featurization: powerful and flexible built-in molecular featurization.
🧠 Pretrained models: for fast and easy inference or transfer learning.
⮔ Read-to-use training loop based on Pytorch Lightning.
🔌 Have a new dataset? Graphium provides a simple plug-and-play interface. Change the path, the name of the columns to predict, the atomic featurization, and you’re ready to play!

Documentation

Visit https://graphium-docs.datamol.io/.

Installation for developers

For CPU and GPU developers

Use mamba, a faster and better alternative to conda.

If you are using a GPU, we recommend enforcing the CUDA version that you need with CONDA_OVERRIDE_CUDA=XX.X.

# Install Graphium's dependencies in a new environment named `graphium`
mamba env create -f env.yml -n graphium

# To force the CUDA version to 11.2, or any other version you prefer, use the following command:
# CONDA_OVERRIDE_CUDA=11.2 mamba env create -f env.yml -n graphium

# Install Graphium in dev mode
mamba activate graphium
pip install --no-deps -e .

For IPU developers

# Install Graphcore's SDK and Graphium dependencies in a new environment called `.graphium_ipu`
./install_ipu.sh .graphium_ipu

The above step needs to be done once. After that, enable the SDK and the environment as follows:

source enable_ipu.sh .graphium_ipu

Training a model

To learn how to train a model, we invite you to look at the documentation, or the jupyter notebooks available here.

If you are not familiar with PyTorch or PyTorch-Lightning, we highly recommend going through their tutorial first.

Running an experiment

We have setup Graphium with hydra for managing config files. To run an experiment go to the expts/ folder. For example, to benchmark a GCN on the ToyMix dataset run

graphium-train architecture=toymix tasks=toymix training=toymix model=gcn

To change parameters specific to this experiment like switching from fp16 to fp32 precision, you can either override them directly in the CLI via

graphium-train architecture=toymix tasks=toymix training=toymix model=gcn trainer.trainer.precision=32

or change them permanently in the dedicated experiment config under expts/hydra-configs/toymix_gcn.yaml. Integrating hydra also allows you to quickly switch between accelerators. E.g., running

graphium-train architecture=toymix tasks=toymix training=toymix model=gcn accelerator=gpu

automatically selects the correct configs to run the experiment on GPU. Finally, you can also run a fine-tuning loop:

graphium-train +finetuning=admet

To use a config file you built from scratch you can run

graphium-train --config-path [PATH] --config-name [CONFIG]

Thanks to the modular nature of hydra you can reuse many of our config settings for your own experiments with Graphium.

Preparing the data in advance

The data preparation including the featurization (e.g., of molecules from smiles to pyg-compatible format) is embedded in the pipeline and will be performed when executing graphium-train [...].

However, when working with larger datasets, it is recommended to perform data preparation in advance using a machine with sufficient allocated memory (e.g., ~400GB in the case of LargeMix). Preparing data in advance is also beneficial when running lots of concurrent jobs with identical molecular featurization, so that resources aren't wasted and processes don't conflict reading/writing in the same directory.

The following command-line will prepare the data and cache it, then use it to train a model.

# First prepare the data and cache it in `path_to_cached_data`
graphium data prepare ++datamodule.args.processed_graph_data_path=[path_to_cached_data]

# Then train the model on the prepared data
graphium-train [...] datamodule.args.processed_graph_data_path=[path_to_cached_data]

Note that datamodule.args.processed_graph_data_path can also be specified at expts/hydra_configs/.

Note that, every time the configs of datamodule.args.featurization changes, you will need to run a new data preparation, which will automatically be saved in a separate directory that uses a hash unique to the configs.

License

Under the Apache-2.0 license. See LICENSE.

Documentation

Diagram for data processing in Graphium.

Diagram for Muti-task network in Graphium

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.0

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntheseus_graphium-0.1.0.tar.gz (5.3 MB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

syntheseus_graphium-0.1.0-py3-none-any.whl (1.1 MB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file syntheseus_graphium-0.1.0.tar.gz.

File metadata

Download URL: syntheseus_graphium-0.1.0.tar.gz
Upload date: Nov 4, 2025
Size: 5.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for syntheseus_graphium-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`85db8dfcdf455673e52917f5cc0c1623877c3c366143be5bb324fcbd5d0d636a`
MD5	`445142733fc6a248bd62cdb5be7f9e4f`
BLAKE2b-256	`a95009bbb9f28e904b8702418570882b520fbd13110e278b99b2a5b10fbcc4e9`

See more details on using hashes here.

File details

Details for the file syntheseus_graphium-0.1.0-py3-none-any.whl.

File metadata

Download URL: syntheseus_graphium-0.1.0-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for syntheseus_graphium-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82e101e79a9178fca0552d28307919b2b3cfbec371b2ff846b4185726fe7d27c`
MD5	`31f79bdfbe69982cd99d017e7db3b237`
BLAKE2b-256	`b894206072b70275038864fe0fbd6c67d5471ddfe034d56c22d94bda53e8bca7`

See more details on using hashes here.

syntheseus-graphium 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Scaling molecular GNNs to infinity

Documentation

Installation for developers

For CPU and GPU developers

For IPU developers

Training a model

Running an experiment

Preparing the data in advance

License

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes