Skip to main content

A modern machine learning library for high-energy physics data analysis

Project description

ColliderML

Tests Coverage Python 3.10+ License: MIT

A modern machine learning library for high-energy physics data analysis.

Installation

pip install colliderml                 # core + Polars loader + unified load()
pip install 'colliderml[sim]'          # local simulation (needs Docker/Podman)
pip install 'colliderml[remote]'       # SaaS backend client
pip install 'colliderml[tasks]'        # benchmark task reference baselines
pip install 'colliderml[all]'          # everything above + dev tools

For development: pip install -e ".[dev]"

Getting the data

Option 1 — Python one-liner (downloads on first call, then caches):

import colliderml

frames = colliderml.load("ttbar_pu0", max_events=200)
print(frames["particles"])             # Polars DataFrame

Option 2 — CLI (explicit download, then load with the library):

colliderml download --channels ttbar --pileup pu0 --objects particles,tracker_hits,calo_hits,tracks --max-events 200

Cache location: default ~/.cache/colliderml, or set COLLIDERML_DATA_DIR. List downloaded configs: colliderml list-configs.

Option 3 — HuggingFace only:

from datasets import load_dataset
dataset = load_dataset("CERN/ColliderML-Release-1", "ttbar_pu0_particles", split="train")

Running simulations

New in v0.4.0: generate events yourself with the full ODD pipeline, either locally in a container or via the SaaS backend.

import colliderml

# Local: runs inside the OpenDataDetector software container.
# Needs Docker or Podman; the `[sim]` extra provides the driver.
result = colliderml.simulate(preset="ttbar-quick")
print(result.run_dir)                  # parquet outputs land here

# Remote: submit to the SaaS backend (requires an HF token).
# The `[remote]` extra pulls in requests; no container runtime needed.
result = colliderml.simulate(preset="higgs-portal-quick", remote=True)
print(result.remote_request_id)

CLI equivalents:

colliderml list-presets
colliderml simulate --preset ttbar-quick --local
colliderml simulate --preset higgs-portal-quick --remote
colliderml status <request-id>
colliderml balance

See the Local Simulation and Remote Simulation guides for details.

Benchmark tasks

New in v0.4.0: six built-in benchmark tasks — tracking, jets, anomaly, tracking_latency, tracking_small, and data_loading — with a unified registry and a leaderboard backed by the SaaS backend.

import colliderml.tasks

print(colliderml.tasks.list_tasks())
scores = colliderml.tasks.evaluate("tracking", "my_preds.parquet")
colliderml.tasks.submit("tracking", "my_preds.parquet")   # earn credits on new bests

Reference baselines (scikit-learn for BDT/IsoForest) ship with the [tasks] extra. See the Benchmark Tasks guide for details.

Using the library

The notebook notebooks/colliderml_loader_exploration.ipynb shows the data-loading and analysis helpers: load_tables, exploding event tables, pileup subsampling, calibration, and plotting.

Full docs: https://opendatadetector.github.io/ColliderML

Development

pytest -v -m "not integration"

Docs are built with VitePress: npm ci --prefix docs && npm run --prefix docs docs:build.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colliderml-0.4.0rc2.tar.gz (592.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

colliderml-0.4.0rc2-py3-none-any.whl (83.4 kB view details)

Uploaded Python 3

File details

Details for the file colliderml-0.4.0rc2.tar.gz.

File metadata

  • Download URL: colliderml-0.4.0rc2.tar.gz
  • Upload date:
  • Size: 592.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for colliderml-0.4.0rc2.tar.gz
Algorithm Hash digest
SHA256 cbe5e7e869168f856fee29dabff1bdd5164ef1bd42e6c904ece35b1f7c7709e2
MD5 cd0c23ffe2b0a60fa94a7c8783757293
BLAKE2b-256 929c961ebbf9821336d34e0ad826e6aadb754f961269e019ee300006fae0be3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for colliderml-0.4.0rc2.tar.gz:

Publisher: publish-pypi.yml on OpenDataDetector/ColliderML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colliderml-0.4.0rc2-py3-none-any.whl.

File metadata

  • Download URL: colliderml-0.4.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 83.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for colliderml-0.4.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 c03050b15ce4d6f1cf168db40b110c5662f5757a329f8d4029558632ae41df3b
MD5 2a716cb3abb59a4f65da7ce6f666c64d
BLAKE2b-256 a3a36e40a3affca3ca684d7e45a2bc2a13b1beda507007eae09edc3baad07b11

See more details on using hashes here.

Provenance

The following attestation bundles were made for colliderml-0.4.0rc2-py3-none-any.whl:

Publisher: publish-pypi.yml on OpenDataDetector/ColliderML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page