Skip to main content

GATO: Gradient-based cATegorization Optimizer for High-Energy Physics

Project description

gato-hep logo

Documentation Status

We present gato-hep: the Gradient-based cATegorization Optimizer for High-Energy Physics. gato-hep learns boundaries in N-dimensional discriminants that maximize signal significance for binned likelihood fits, using a differentiable approximation of signal significance and gradient descent techniques for optimization with TensorFlow.

This repository contains the code for the GATO approach shown in Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics. If you use the package in your work, please cite:

@article{Erdmann:2026opi,
    author = "Erdmann, Johannes and Kasaraguppe, Nitish Kumar and Mausolf, Florian",
    title = "{Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics}",
    eprint = "2601.07756",
    archivePrefix = "arXiv",
    primaryClass = "physics.data-an",
    month = "1",
    year = "2026"
}

Key Features

  • Optimize categorizations in multi-dimensional spaces using Gaussian Mixture Models (GMM) or 1D sigmoid-based models
  • Set the range of the discriminant dimensions as needed for your analysis
  • Penalize low-yield or high-uncertainty categories to keep optimizations analysis-friendly
  • Built-in annealing schedules for temperature / steepness (setting the level of approximation for differentiability), and learning rate to stabilize training
  • Ready-to-run toy workflows that mirror real HEP analysis patterns

Installation

Latest release (PyPI)

pip install gato-hep

The base install targets CPU execution and pulls the tested TensorFlow stack automatically. Optional extras:

pip install "gato-hep[gpu]"   # CUDA-enabled TensorFlow wheels

For the GPU extra you still need NVIDIA drivers and CUDA libraries that match the selected TensorFlow build.

From source

git clone https://github.com/FloMau/gato-hep.git
cd gato-hep
python -m venv .venv  # or use micromamba/conda
source .venv/bin/activate
pip install -e ".[dev]"

Requirements: Python ≥ 3.10. See pyproject.toml for the authoritative dependency pins.

Quickstart

The snippet below mirrors the three-class softmax demo. It generates the 3D toy sample, fits a two-dimensional Gaussian mixture model to the softmax scores, and reports the per-signal significances produced by the learnt categories.

import numpy as np
import tensorflow as tf
from pathlib import Path

from gatohep.data_generation import generate_toy_data_3class_3D
from gatohep.models import gato_gmm_model


def convert_data_to_tensors(data):
    tensors = {}
    for proc, df in data.items():
        scores = np.stack(df["NN_output"].values)[:, :2]  # keep the first two dims
        weights = df["weight"].values
        tensors[proc] = {
            "NN_output": tf.convert_to_tensor(scores, tf.float32),
            "weight": tf.convert_to_tensor(weights, tf.float32),
        }
    return tensors

# setup class for the 2D discriminant optimization
class SoftmaxGMM(gato_gmm_model):
    def __init__(self, n_cats, temperature=0.3):
        super().__init__(
            n_cats=n_cats,
            dim=2,
            temperature=temperature,
            mean_norm="softmax",
        )
    def call(self, data_dict):
        # Differentiate through the Asimov significances provided by the helper
        significances = self.get_differentiable_significance(
            data_dict,
            signal_labels=["signal1", "signal2"],
        )
        z1 = significances["signal1"]
        z2 = significances["signal2"]
        return -tf.sqrt(z1 * z2)  # geometric-mean loss

# load your data as dictionary containing pandas DataFrames, or use the integrated toy data generation:
data = generate_toy_data_3class_3D()
tensors = convert_data_to_tensors(data)

# example: use 10 bins
model = SoftmaxGMM(n_cats=10, temperature=0.3)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.05)

# actual training
for epoch in range(100):
    with tf.GradientTape() as tape:
        loss = model.call(tensors)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

# Save the trained model for later use in the analysis to some path
checkpoint_path = Path("softmax_demo_ckpt")
model.save(checkpoint_path)

# Restore the model
restored = SoftmaxGMM(n_cats=10, temperature=0.3)
restored.restore(checkpoint_path)

# Obtain the hard (non-differentiable) bin assignments
assignments = restored.get_bin_indices(tensors)

See examples/three_class_softmax_example/run_example.py for the full training loop with schedulers, plotting helpers, and GIF generation.

Examples

  • examples/1D_example/run_sigmoid_example.py – sigmoid-based boundaries for a single discriminant.
  • examples/1D_example/run_gmm_example.py – GMM-based categorisation for the same data.
  • examples/three_class_softmax_example/run_example.py – optimize categories directly on a 3-class softmax output (shown in 2D projections).
  • examples/bumphunt_example/run_example.py – $H\to\gamma\gamma$–style bump hunt example with inference on the mass, but including the background over a wider range for increased statistical power.

Every script populates an examples/.../Plots*/ folder with plots and checkpoints.

Further Reading

Contributing

  1. Fork and branch: git checkout -b feature/xyz.
  2. Implement changes under src/gatohep/ and possibly add/adjust tests in tests/.
  3. Format and lint (flake8) and run pytest.
  4. Open a pull request summarizing the physics motivation and technical changes.

License

MIT License © Florian Mausolf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gato_hep-0.2.1.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gato_hep-0.2.1-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file gato_hep-0.2.1.tar.gz.

File metadata

  • Download URL: gato_hep-0.2.1.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for gato_hep-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2be62e218d7c2b31f10738624ba4fe217f3016dbb86ecbca6eb2a4859790ba8e
MD5 753706970f0ddfca1f77693bd936cb1f
BLAKE2b-256 39d20e5ec76078ebb6c144b108f4cbaa02c7e3ea458270780fbf0b79be69f489

See more details on using hashes here.

File details

Details for the file gato_hep-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gato_hep-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for gato_hep-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4deefd7ecb810646d305c25ed55f0603b4617ac069888810db718f5ae5f0395
MD5 290db9ba34065653eb5dfb7798bceda2
BLAKE2b-256 717be293916f7269c7fcb3312de0251db34a602cee7d4780fd747d935981e059

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page