Skip to main content

GATO: Gradient-based categorization optimization for HEP analyses

Project description

gato-hep logo

Documentation Status

We present gato-hep: the Gradient-based cATegorization Optimizer for High Energy Physics analyses. gato-hep learns boundaries in N-dimensional discriminants that maximize signal significance for binned likelihood fits, using a differentiable approximation of signal significance and gradient descent techniques for optimization with TensorFlow.

Key Features

  • Optimize categorizations in multi-dimensional spaces using Gaussian Mixture Models (GMM) or 1D sigmoid-based models
  • Set the range of the discriminant dimensions as needed for your analysis
  • Penalize low-yield or high-uncertainty categories to keep optimizations analysis-friendly
  • Built-in annealing schedules for temperature / steepness (setting the level of approximation for differentiability), and learning rate to stabilize training
  • Ready-to-run toy workflows that mirror real HEP analysis patterns

Installation

Latest release (PyPI)

pip install gato-hep

The base install targets CPU execution and pulls the tested TensorFlow stack automatically. Optional extras:

pip install "gato-hep[gpu]"   # CUDA-enabled TensorFlow wheels

For the GPU extra you still need NVIDIA drivers and CUDA libraries that match the selected TensorFlow build.

From source

git clone https://github.com/FloMau/gato-hep.git
cd gato-hep
python -m venv .venv  # or use micromamba/conda
source .venv/bin/activate
pip install -e ".[dev]"

Requirements: Python ≥ 3.10. See pyproject.toml for the authoritative dependency pins.

Quickstart

The snippet below mirrors the three-class softmax demo. It generates the 3D toy sample, fits a two-dimensional Gaussian mixture model to the softmax scores, and reports the per-signal significances produced by the learnt categories.

import numpy as np
import tensorflow as tf
from pathlib import Path

from gatohep.data_generation import generate_toy_data_3class_3D
from gatohep.models import gato_gmm_model


def convert_data_to_tensors(data):
    tensors = {}
    for proc, df in data.items():
        scores = np.stack(df["NN_output"].values)[:, :2]  # keep the first two dims
        weights = df["weight"].values
        tensors[proc] = {
            "NN_output": tf.convert_to_tensor(scores, tf.float32),
            "weight": tf.convert_to_tensor(weights, tf.float32),
        }
    return tensors

# setup class for the 2D discriminant optimization
class SoftmaxGMM(gato_gmm_model):
    def __init__(self, n_cats, temperature=0.3):
        super().__init__(
            n_cats=n_cats,
            dim=2,
            temperature=temperature,
            mean_norm="softmax",
        )
    def call(self, data_dict):
        # Differentiate through the Asimov significances provided by the helper
        significances = self.get_differentiable_significance(
            data_dict,
            signal_labels=["signal1", "signal2"],
        )
        z1 = significances["signal1"]
        z2 = significances["signal2"]
        return -tf.sqrt(z1 * z2)  # geometric-mean loss

# load your data as dictionary containing pandas DataFrames, or use the integrated toy data generation:
data = generate_toy_data_3class_3D()
tensors = convert_data_to_tensors(data)

# example: use 10 bins
model = SoftmaxGMM(n_cats=10, temperature=0.3)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.05)

# actual training
for epoch in range(100):
    with tf.GradientTape() as tape:
        loss = model.call(tensors)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

# Save the trained model for later use in the analysis to some path
checkpoint_path = Path("softmax_demo_ckpt")
model.save(checkpoint_path)

# Restore the model
restored = SoftmaxGMM(n_cats=10, temperature=0.3)
restored.restore(checkpoint_path)

# Obtain the hard (non-differentiable) bin assignments
assignments = restored.get_bin_indices(tensors)

See examples/three_class_softmax_example/run_example.py for the full training loop with schedulers, plotting helpers, and GIF generation.

Examples & Tutorials

  • examples/1D_example/run_sigmoid_example.py – sigmoid-based boundaries for a single discriminant.
  • examples/1D_example/run_gmm_example.py – GMM-based categorisation for the same data.
  • examples/three_class_softmax_example/run_example.py – optimize categories directly on a 3-class softmax output (shown in 2D projections).
  • examples/bumphunt_example/run_example.py – $H\to\gamma\gamma$–style bump hunt example with inference on the mass, but including the background over a wider range for increased statistical power.

Every script populates an examples/.../Plots*/ folder with plots and checkpoints.

Further Reading

Contributing

  1. Fork and branch: git checkout -b feature/xyz.
  2. Implement changes under src/gatohep/ and possibly add/adjust tests in tests/.
  3. Format and lint (flake8) and run pytest.
  4. Open a pull request summarizing the physics motivation and technical changes.

License

MIT License © Florian Mausolf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gato_hep-0.2.0.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gato_hep-0.2.0-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file gato_hep-0.2.0.tar.gz.

File metadata

  • Download URL: gato_hep-0.2.0.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gato_hep-0.2.0.tar.gz
Algorithm Hash digest
SHA256 924bc5abc958208b9741dae86336e82fef160674541b5dee9f578ad42edc3303
MD5 1dc170df449a61f659f23253651eaa69
BLAKE2b-256 b3dc2f4a3b27d64e69f7398b8f9ff25b3b9dc0f5dc76644fafde0ebeaca2e28a

See more details on using hashes here.

Provenance

The following attestation bundles were made for gato_hep-0.2.0.tar.gz:

Publisher: release.yml on FloMau/gato-hep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gato_hep-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gato_hep-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gato_hep-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b0da2c104a804cef831725df7e2fd82fc36b348e4974bb996b85f797c2b7058
MD5 135f335965a0dfb196733587c45d25fa
BLAKE2b-256 a1998b1f01d5cfd5d6b64fc05fdf5334e47f881c95c963773a1487a1d276db60

See more details on using hashes here.

Provenance

The following attestation bundles were made for gato_hep-0.2.0-py3-none-any.whl:

Publisher: release.yml on FloMau/gato-hep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page