GATO: Gradient-based cATegorization Optimizer for High-Energy Physics
Project description
We present gato-hep: the Gradient-based cATegorization Optimizer for High-Energy Physics.
gato-hep learns boundaries in N-dimensional discriminants that maximize signal significance for binned likelihood fits, using a differentiable approximation of signal significance and gradient descent techniques for optimization with TensorFlow.
- 🐙 GitHub: https://github.com/FloMau/gato-hep
- 📘 Documentation: https://gato-hep.readthedocs.io/
- 📦 PyPI: https://pypi.org/project/gato-hep/
- 🧪 Examples: see the
examples/directory in this repository
This repository contains the code for the GATO approach shown in Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics. If you use the package in your work, please cite:
@article{Erdmann:2026opi,
author = "Erdmann, Johannes and Kasaraguppe, Nitish Kumar and Mausolf, Florian",
title = "{Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics}",
eprint = "2601.07756",
archivePrefix = "arXiv",
primaryClass = "physics.data-an",
month = "1",
year = "2026"
}
Key Features
- Optimize categorizations in multi-dimensional spaces using Gaussian Mixture Models (GMM) or 1D sigmoid-based models
- Set the range of the discriminant dimensions as needed for your analysis
- Penalize low-yield or high-uncertainty categories to keep optimizations analysis-friendly
- Built-in annealing schedules for temperature / steepness (setting the level of approximation for differentiability), and learning rate to stabilize training
- Ready-to-run toy workflows that mirror real HEP analysis patterns
Installation
Latest release (PyPI)
pip install gato-hep
The base install targets CPU execution and pulls the tested TensorFlow stack automatically. Optional extras:
pip install "gato-hep[gpu]" # CUDA-enabled TensorFlow wheels
For the GPU extra you still need NVIDIA drivers and CUDA libraries that match the selected TensorFlow build.
From source
git clone https://github.com/FloMau/gato-hep.git
cd gato-hep
python -m venv .venv # or use micromamba/conda
source .venv/bin/activate
pip install -e ".[dev]"
Requirements: Python ≥ 3.10. See pyproject.toml for the authoritative dependency pins.
Quickstart
The snippet below mirrors the three-class softmax demo. It generates the 3D toy sample, fits a two-dimensional Gaussian mixture model to the softmax scores, and reports the per-signal significances produced by the learnt categories.
import numpy as np
import tensorflow as tf
from pathlib import Path
from gatohep.data_generation import generate_toy_data_3class_3D
from gatohep.models import gato_gmm_model
def convert_data_to_tensors(data):
tensors = {}
for proc, df in data.items():
scores = np.stack(df["NN_output"].values)[:, :2] # keep the first two dims
weights = df["weight"].values
tensors[proc] = {
"NN_output": tf.convert_to_tensor(scores, tf.float32),
"weight": tf.convert_to_tensor(weights, tf.float32),
}
return tensors
# setup class for the 2D discriminant optimization
class SoftmaxGMM(gato_gmm_model):
def __init__(self, n_cats, temperature=0.3):
super().__init__(
n_cats=n_cats,
dim=2,
temperature=temperature,
mean_norm="softmax",
)
def call(self, data_dict):
# Differentiate through the Asimov significances provided by the helper
significances = self.get_differentiable_significance(
data_dict,
signal_labels=["signal1", "signal2"],
)
z1 = significances["signal1"]
z2 = significances["signal2"]
return -tf.sqrt(z1 * z2) # geometric-mean loss
# load your data as dictionary containing pandas DataFrames, or use the integrated toy data generation:
data = generate_toy_data_3class_3D()
tensors = convert_data_to_tensors(data)
# example: use 10 bins
model = SoftmaxGMM(n_cats=10, temperature=0.3)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.05)
# actual training
for epoch in range(100):
with tf.GradientTape() as tape:
loss = model.call(tensors)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# Save the trained model for later use in the analysis to some path
checkpoint_path = Path("softmax_demo_ckpt")
model.save(checkpoint_path)
# Restore the model
restored = SoftmaxGMM(n_cats=10, temperature=0.3)
restored.restore(checkpoint_path)
# Obtain the hard (non-differentiable) bin assignments
assignments = restored.get_bin_indices(tensors)
See examples/three_class_softmax_example/run_example.py for the full training loop with schedulers, plotting helpers, and GIF generation.
Examples
examples/1D_example/run_sigmoid_example.py– sigmoid-based boundaries for a single discriminant.examples/1D_example/run_gmm_example.py– GMM-based categorisation for the same data.examples/three_class_softmax_example/run_example.py– optimize categories directly on a 3-class softmax output (shown in 2D projections).examples/bumphunt_example/run_example.py– $H\to\gamma\gamma$–style bump hunt example with inference on the mass, but including the background over a wider range for increased statistical power.
Every script populates an examples/.../Plots*/ folder with plots and checkpoints.
Further Reading
- Full documentation, including the API reference: https://gato-hep.readthedocs.io/
- Issues & feature requests: https://github.com/FloMau/gato-hep/issues
Contributing
- Fork and branch:
git checkout -b feature/xyz. - Implement changes under
src/gatohep/and possibly add/adjust tests intests/. - Format and lint (
flake8) and runpytest. - Open a pull request summarizing the physics motivation and technical changes.
License
MIT License © Florian Mausolf
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gato_hep-0.2.1.tar.gz.
File metadata
- Download URL: gato_hep-0.2.1.tar.gz
- Upload date:
- Size: 34.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2be62e218d7c2b31f10738624ba4fe217f3016dbb86ecbca6eb2a4859790ba8e
|
|
| MD5 |
753706970f0ddfca1f77693bd936cb1f
|
|
| BLAKE2b-256 |
39d20e5ec76078ebb6c144b108f4cbaa02c7e3ea458270780fbf0b79be69f489
|
File details
Details for the file gato_hep-0.2.1-py3-none-any.whl.
File metadata
- Download URL: gato_hep-0.2.1-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4deefd7ecb810646d305c25ed55f0603b4617ac069888810db718f5ae5f0395
|
|
| MD5 |
290db9ba34065653eb5dfb7798bceda2
|
|
| BLAKE2b-256 |
717be293916f7269c7fcb3312de0251db34a602cee7d4780fd747d935981e059
|