A modern machine learning library for high-energy physics data analysis
Project description
ColliderML
A modern machine learning library for high-energy physics data analysis.
Installation
pip install colliderml # core + Polars loader + unified load()
pip install 'colliderml[sim]' # local simulation (needs Docker/Podman)
pip install 'colliderml[remote]' # SaaS backend client
pip install 'colliderml[tasks]' # benchmark task reference baselines
pip install 'colliderml[all]' # everything above + dev tools
For development: pip install -e ".[dev]"
Getting the data
Option 1 — Python one-liner (downloads on first call, then caches):
import colliderml
frames = colliderml.load("ttbar_pu0", max_events=200)
print(frames["particles"]) # Polars DataFrame
Option 2 — CLI (explicit download, then load with the library):
colliderml download --channels ttbar --pileup pu0 --objects particles,tracker_hits,calo_hits,tracks --max-events 200
Cache location: default ~/.cache/colliderml, or set COLLIDERML_DATA_DIR. List downloaded configs: colliderml list-configs.
Option 3 — HuggingFace only:
from datasets import load_dataset
dataset = load_dataset("CERN/ColliderML-Release-1", "ttbar_pu0_particles", split="train")
Running simulations
New in v0.4.0: generate events yourself with the full ODD pipeline, either locally in a container or via the SaaS backend.
import colliderml
# Local: runs inside the OpenDataDetector software container.
# Needs Docker or Podman; the `[sim]` extra provides the driver.
result = colliderml.simulate(preset="ttbar-quick")
print(result.run_dir) # parquet outputs land here
# Remote: submit to the SaaS backend (requires an HF token).
# The `[remote]` extra pulls in requests; no container runtime needed.
result = colliderml.simulate(preset="higgs-portal-quick", remote=True)
print(result.remote_request_id)
CLI equivalents:
colliderml list-presets
colliderml simulate --preset ttbar-quick --local
colliderml simulate --preset higgs-portal-quick --remote
colliderml status <request-id>
colliderml balance
See the Local Simulation and Remote Simulation guides for details.
Benchmark tasks
New in v0.4.0: six built-in benchmark tasks — tracking, jets, anomaly, tracking_latency, tracking_small, and data_loading — with a unified registry and a leaderboard backed by the SaaS backend.
import colliderml.tasks
print(colliderml.tasks.list_tasks())
scores = colliderml.tasks.evaluate("tracking", "my_preds.parquet")
colliderml.tasks.submit("tracking", "my_preds.parquet") # earn credits on new bests
Reference baselines (scikit-learn for BDT/IsoForest) ship with the [tasks] extra. See the Benchmark Tasks guide for details.
Using the library
The notebook notebooks/colliderml_loader_exploration.ipynb shows the data-loading and analysis helpers: load_tables, exploding event tables, pileup subsampling, calibration, and plotting.
Full docs: https://opendatadetector.github.io/ColliderML
Development
pytest -v -m "not integration"
Docs are built with VitePress: npm ci --prefix docs && npm run --prefix docs docs:build.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file colliderml-0.4.0rc2.tar.gz.
File metadata
- Download URL: colliderml-0.4.0rc2.tar.gz
- Upload date:
- Size: 592.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbe5e7e869168f856fee29dabff1bdd5164ef1bd42e6c904ece35b1f7c7709e2
|
|
| MD5 |
cd0c23ffe2b0a60fa94a7c8783757293
|
|
| BLAKE2b-256 |
929c961ebbf9821336d34e0ad826e6aadb754f961269e019ee300006fae0be3d
|
Provenance
The following attestation bundles were made for colliderml-0.4.0rc2.tar.gz:
Publisher:
publish-pypi.yml on OpenDataDetector/ColliderML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
colliderml-0.4.0rc2.tar.gz -
Subject digest:
cbe5e7e869168f856fee29dabff1bdd5164ef1bd42e6c904ece35b1f7c7709e2 - Sigstore transparency entry: 1294043848
- Sigstore integration time:
-
Permalink:
OpenDataDetector/ColliderML@4d0129a14e00216eda588cda541416569c78c6ef -
Branch / Tag:
refs/tags/v0.4.0rc2 - Owner: https://github.com/OpenDataDetector
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4d0129a14e00216eda588cda541416569c78c6ef -
Trigger Event:
release
-
Statement type:
File details
Details for the file colliderml-0.4.0rc2-py3-none-any.whl.
File metadata
- Download URL: colliderml-0.4.0rc2-py3-none-any.whl
- Upload date:
- Size: 83.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c03050b15ce4d6f1cf168db40b110c5662f5757a329f8d4029558632ae41df3b
|
|
| MD5 |
2a716cb3abb59a4f65da7ce6f666c64d
|
|
| BLAKE2b-256 |
a3a36e40a3affca3ca684d7e45a2bc2a13b1beda507007eae09edc3baad07b11
|
Provenance
The following attestation bundles were made for colliderml-0.4.0rc2-py3-none-any.whl:
Publisher:
publish-pypi.yml on OpenDataDetector/ColliderML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
colliderml-0.4.0rc2-py3-none-any.whl -
Subject digest:
c03050b15ce4d6f1cf168db40b110c5662f5757a329f8d4029558632ae41df3b - Sigstore transparency entry: 1294043955
- Sigstore integration time:
-
Permalink:
OpenDataDetector/ColliderML@4d0129a14e00216eda588cda541416569c78c6ef -
Branch / Tag:
refs/tags/v0.4.0rc2 - Owner: https://github.com/OpenDataDetector
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4d0129a14e00216eda588cda541416569c78c6ef -
Trigger Event:
release
-
Statement type: