Machine Learning for Particle Flow Reconstruction
Project description
Summary
ML-based particle flow (MLPF) focuses on developing full event reconstruction for particle detectors using computationally scalable and flexible machine learning models. The project aims to improve particle flow reconstruction across various detector environments, including CMS, as well as future detectors via Key4HEP. We build on existing, open-source simulation software by the experimental collaborations.
TLDR; I just want to run the code
You can use uv to set up the repo and test that everything works:
git clone --recurse-submodules https://github.com/jpata/particleflow.git
uv sync
uv run ./scripts/local_test_cld.sh
uv run ./scripts/local_test_cms.sh
Alternatively, you can use a prepared container:
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cld.sh
apptainer exec --nv https://jpata.web.cern.ch/jpata/pytorch-20260305-08d6950.sif ./scripts/local_test_cms.sh
Datasets
If you wish to train on pre-made datasets, you can download them from the Hugging Face Hub. To download a specific dataset and split (e.g., CLD, PF setup, configuration split 1):
uv run hf download jpata/particleflow \
--include "tensorflow_datasets/cld/cld_edm_*_pf/1/*" \
--local-dir data/tfds \
--repo-type dataset
This will download the requested files into data/tfds/tensorflow_datasets/cld/cld_edm_*_pf/1/.
Dataset Upload
To upload a generated dataset to the Hugging Face Hub:
uv run python3 scripts/upload_hf.py --repo jpata/particleflow --spec particleflow_spec.yaml clic 1
Training
Run the training on the downloaded data configuration split
uv run \
python mlpf/pipeline.py \
--spec-file particleflow_spec.yaml \
--production cld \
--model-name pyg-cld-v1 \
--data-dir data/tfds/tensorflow_datasets/cld \
train \
--data_config 1 \
--gpu_batch_multiplier 4 \
--gpus 1
Model Upload
To upload a trained model to the Hugging Face Hub:
uv run python3 scripts/upload_model_hf.py experiments/pyg-clic-hits-v1_clic_20260328_144021_479374 --version v3.1.0
Model Download & Evaluation
To download a specific model (e.g., CLD, cluster-based, version v3.1.0) and run evaluation on a sample ROOT file:
- Download the model files from the Hugging Face Hub:
uv run hf download jpata/particleflow \
--include "cld/clusters/v3.1.0/pyg-cld-v1_cld_20260328_101206_533260/*" \
--local-dir models \
--repo-type model
- Run the evaluation script:
mkdir -p local_test_data/cld/p8_ee_ttbar_ecm365/root
cd local_test_data/cld/p8_ee_ttbar_ecm365/root
wget -q --no-check-certificate -nc https://jpata.web.cern.ch/jpata/mlpf/cld/v1.2.3_key4hep_2025-05-29_CLD_f1e8f9/gen/root/reco_p8_ee_ttbar_ecm365_300000.root
cd ../../..
uv run python3 mlpf/standalone_eval/key4hep/evaluator.py \
--input local_test_data/cld/p8_ee_ttbar_ecm365/root/reco_p8_ee_ttbar_ecm365_300000.root \
--checkpoint models/cld/clusters/v3.1.0/pyg-cld-v1_cld_20260328_101206_533260/checkpoints/best_weights.pth \
--detector cld \
--outpath eval_results.parquet
The input ROOT file should be in the EDM4hep format.
End-to-end workflow: dataset generation and model training
The full data generation, model training, and validation workflow are managed using Pixi for environment and Snakemake for job orchestration. Apptainer images are used to provide the software for the steps for different detetors.
#ensure all gen configs are downloaded
git submodule update --init --recursive
# install pixi, restart your shell or source your .bashrc after this. only do once.
curl -fsSL https://pixi.sh/install.sh | bash
# copy the configuration for your site. only do once.
ln -s configs/{local,tallinn,lxplus}/pixi.toml pixi.toml
# initalize the orhcestrator python environment. only do this once.
pixi run init
# generate the snakefile (will overwrite the defaults)
PROD={cms_run3,clic,cld} pixi run snakefile
# run the steps (this will take many days and thousands of jobs), so run inside screen or tmux
PROD={cms_run3,clic,cld} pixi run gen
PROD={cms_run3,clic,cld} pixi run post
PROD={cms_run3,clic,cld} pixi run tfds
PROD={cms_run3,clic,cld} pixi run train
Publications
The following publications trace the development of MLPF from early proofs of concept to full detector simulations and fine-tuning studies across detectors.
- [2021] First full-event GNN demonstration of MLPF: Paper Code Dataset
- [2021] First demonstration in CMS Run 3: Paper CMS-DP
- [2022] Improved performance in CMS Run 3: CMS-DP
- [2024] Improved performance with full simulation for future colliders: Paper Code Results
- [2025] Fine-tuning across detectors: Paper Code
- [2026] CMS Run 3 full results: Paper CMS-DP Code
Citations and Reuse
You are welcome to reuse the code in accordance with the LICENSE.
How to Cite
- Academic Work: Please cite the specific papers listed in the Publications section above relevant to the method you are using (e.g., initial GNN idea, fine-tuning, or specific detector studies).
- Code Usage: If you use the code significantly for research, please cite the specific tagged version from Zenodo.
- Dataset Usage: Cite the appropriate dataset via the Zenodo link and the corresponding paper.
Contact
For collaboration ideas that do not fit into the categories above, please get in touch via GitHub Discussions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file particleflow-3.1.0.tar.gz.
File metadata
- Download URL: particleflow-3.1.0.tar.gz
- Upload date:
- Size: 216.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dc182f07b645c8840ca141183bdab4a5432a9539b0945b0656160bb298e6d80
|
|
| MD5 |
56e1fc89739b1299558ef81ad2a1445f
|
|
| BLAKE2b-256 |
792890e7d409b2d13bb436b545b28efd272df405313626d59bfc9bf860d1a76e
|
Provenance
The following attestation bundles were made for particleflow-3.1.0.tar.gz:
Publisher:
pypi-publish.yml on jpata/particleflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
particleflow-3.1.0.tar.gz -
Subject digest:
2dc182f07b645c8840ca141183bdab4a5432a9539b0945b0656160bb298e6d80 - Sigstore transparency entry: 1247049098
- Sigstore integration time:
-
Permalink:
jpata/particleflow@a8525f1b6a2ee3de8c5a629ea0347c8fe8edaee0 -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/jpata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@a8525f1b6a2ee3de8c5a629ea0347c8fe8edaee0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file particleflow-3.1.0-py3-none-any.whl.
File metadata
- Download URL: particleflow-3.1.0-py3-none-any.whl
- Upload date:
- Size: 249.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15390cccf8c0590cb8f719f458245fc3b17d1bce510e86bf6e31841c37a946aa
|
|
| MD5 |
00467ce7dcba3718d4741e07827b4ea9
|
|
| BLAKE2b-256 |
c1278cd2ba8ef18417294e9ac648133984cc415b84e3812542e942a9dd9484c0
|
Provenance
The following attestation bundles were made for particleflow-3.1.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on jpata/particleflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
particleflow-3.1.0-py3-none-any.whl -
Subject digest:
15390cccf8c0590cb8f719f458245fc3b17d1bce510e86bf6e31841c37a946aa - Sigstore transparency entry: 1247049102
- Sigstore integration time:
-
Permalink:
jpata/particleflow@a8525f1b6a2ee3de8c5a629ea0347c8fe8edaee0 -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/jpata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@a8525f1b6a2ee3de8c5a629ea0347c8fe8edaee0 -
Trigger Event:
push
-
Statement type: