Skip to main content

Machine Learning for Particle Flow Reconstruction

Project description

Summary

ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.

High-level overview


TLDR; I just want to run the code

You can use uv to set up the repo and test that everything works:

git clone --recurse-submodules https://github.com/jpata/particleflow.git
uv sync
uv run ./scripts/local_test_cld.sh
uv run ./scripts/local_test_cms.sh

Alternatively, you can use a prepared container:

apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cld.sh
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cms.sh

Datasets

If you wish to train on pre-made datasets, you can download them from the Hugging Face Hub. To download a specific dataset and split (e.g., CLD, PF setup, configuration split 1):

uv run hf download jpata/particleflow \
  --include "tensorflow_datasets/cld/cld_edm_*_pf/1/*" \
  --local-dir data/tfds \
  --repo-type dataset

This will download the requested files into data/tfds/tensorflow_datasets/cld/cld_edm_*_pf/1/.

Dataset Upload

To upload a generated dataset to the Hugging Face Hub:

uv run python3 scripts/upload_hf.py --repo jpata/particleflow --spec particleflow_spec.yaml clic 1

Training

Run the training on the downloaded data configuration split

uv run \
    python mlpf/pipeline.py \
    --spec-file particleflow_spec.yaml \
    --production cld \
    --model-name pyg-cld-v1 \
    --data-dir data/tfds/tensorflow_datasets/cld \
    train \
    --data_config 1 \
    --gpu_batch_multiplier 4 \
    --gpus 1

Model Upload

To upload a trained model to the Hugging Face Hub:

uv run python3 scripts/upload_model_hf.py experiments/pyg-clic-hits-v1_clic_20260328_144021_479374 --version v3.1.0

Model Download & Evaluation

To download a specific model (e.g., CLD, cluster-based, version v3.1.0) and run evaluation on a sample ROOT file:

  1. Download the model files from the Hugging Face Hub:
uv run hf download jpata/particleflow \
  --include "cld/clusters/v3.1.0/pyg-cld-v1_cld_20260328_101206_533260/*" \
  --local-dir models \
  --repo-type model
  1. Run the evaluation script:
mkdir -p local_test_data/cld/p8_ee_ttbar_ecm365/root
cd local_test_data/cld/p8_ee_ttbar_ecm365/root
wget -q --no-check-certificate -nc https://jpata.web.cern.ch/jpata/mlpf/cld/v1.2.3_key4hep_2025-05-29_CLD_f1e8f9/gen/root/reco_p8_ee_ttbar_ecm365_300000.root
cd ../../..

uv run python3 mlpf/standalone_eval/key4hep/evaluator.py \
  --input local_test_data/cld/p8_ee_ttbar_ecm365/root/reco_p8_ee_ttbar_ecm365_300000.root \
  --checkpoint models/cld/clusters/v3.1.0/pyg-cld-v1_cld_20260328_101206_533260/checkpoints/best_weights.pth \
  --detector cld \
  --outpath eval_results.parquet

The input ROOT file should be in the EDM4hep format.

End-to-end workflow: dataset generation and model training

The full data generation, model training, and validation workflow are managed using Pixi for environment and Snakemake for job orchestration. Apptainer images are used to provide the software for the steps for different detetors.

#ensure all gen configs are downloaded
git submodule update --init --recursive

# install pixi, restart your shell or source your .bashrc after this. only do once.
curl -fsSL https://pixi.sh/install.sh | bash

# copy the configuration for your site. only do once.
ln -s configs/{local,tallinn,lxplus}/pixi.toml pixi.toml

# initalize the orhcestrator python environment. only do this once.
pixi run init

# generate the snakefile (will overwrite the defaults)
PROD={cms_run3,clic,cld} pixi run snakefile

# run the steps (this will take many days and thousands of jobs), so run inside screen or tmux
PROD={cms_run3,clic,cld} pixi run gen
PROD={cms_run3,clic,cld} pixi run post
PROD={cms_run3,clic,cld} pixi run tfds
PROD={cms_run3,clic,cld} pixi run train

Publications

The following publications trace the development of MLPF from early proofs of concept to full detector simulations and fine-tuning studies across detectors.


Citations and Reuse

You are welcome to reuse the code in accordance with the LICENSE.

How to Cite

  1. Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
  2. Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
  3. Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.

Contact

For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

particleflow-3.1.0.tar.gz (216.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

particleflow-3.1.0-py3-none-any.whl (249.4 kB view details)

Uploaded Python 3

File details

Details for the file particleflow-3.1.0.tar.gz.

File metadata

  • Download URL: particleflow-3.1.0.tar.gz
  • Upload date:
  • Size: 216.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for particleflow-3.1.0.tar.gz
Algorithm Hash digest
SHA256 2dc182f07b645c8840ca141183bdab4a5432a9539b0945b0656160bb298e6d80
MD5 56e1fc89739b1299558ef81ad2a1445f
BLAKE2b-256 792890e7d409b2d13bb436b545b28efd272df405313626d59bfc9bf860d1a76e

See more details on using hashes here.

Provenance

The following attestation bundles were made for particleflow-3.1.0.tar.gz:

Publisher: pypi-publish.yml on jpata/particleflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file particleflow-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: particleflow-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 249.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for particleflow-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15390cccf8c0590cb8f719f458245fc3b17d1bce510e86bf6e31841c37a946aa
MD5 00467ce7dcba3718d4741e07827b4ea9
BLAKE2b-256 c1278cd2ba8ef18417294e9ac648133984cc415b84e3812542e942a9dd9484c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for particleflow-3.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on jpata/particleflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page