Skip to main content

Monte Carlo reweighting toolkit with multiple backends (hep_ml, XGBoost, neural networks, folding, and binning).

Project description

mcreweight

mcreweight is a Python package for Monte Carlo event reweighting to match data distributions in multiplicity and kinematic variables. It provides multiple reweighting backends, including hep_ml-based gradient boosting, XGBoost, neural-network approaches, folding variants, and bin-based reweighting, with optional Optuna hyperparameter tuning and integrated plotting/validation utilities.

[!WARNING] Bins reweighting is a useful low-dimensional baseline, but it becomes unstable quickly as the number of training variables grows. Prefer it for one or two dimensions, use extra care around three or four, and avoid relying on it as the main method in higher-dimensional problems.

Documentation

Full documentation is available at mcreweight.readthedocs.io

It covers:

Setup

Run in a lb-conda environment, as

lb-conda mcreweight

Installation

If you don't run in a lb-conda environment, consider installing the python package from PyPI or cloning it from GitLab.

From PyPI

pip install mcreweight

From GitLab

Requires pixi.

git clone https://gitlab.cern.ch/lhcb-dpa/tools/mcreweight.git
cd mcreweight

To run the CLI tools you can prefix them with pixi run, i.e.

pixi run run-reweight --help
pixi run apply-weights --help

To run the verification checks used in CI:

pixi run -e lint quality
pixi run test
pixi run -e docs build-docs

To auto-format the code before rerunning the checks:

pixi run -e lint black .
pixi run -e lint quality

This repository currently uses these Pixi environments:

  • default: package runtime and CLI usage
  • lint: black and ruff
  • test: pytest
  • docs: Sphinx documentation build

Useful Pixi commands:

pixi run -e lint ruff check .
pixi run -e lint black --check .
pixi run -e docs sphinx-build -b html -n -W --keep-going docs docs/_build/html
pixi run -e test pytest tests/test_cli.py::test_run_reweight -v
pixi run -e test pytest tests/test_cli.py::test_apply_weights -v

The pixi.lock file pins all dependencies for reproducibility. To update them, run pixi update and commit the updated lock file.

Usage

Both CLIs support two usage modes:

  • Config-driven mode (recommended): pass --config <file.yaml>
  • Direct CLI mode: pass all options on the command line

Use --dry-run to validate the merged configuration without running.

Both CLIs treat weightsdir and plotdir as root directories:

  • run-reweight writes to <weightsdir>/<sample>/ and <plotdir>/<sample>/
  • apply-weights reads trained artifacts from <weightsdir>/<training-sample>/
  • apply-weights writes normalized weights to <weightsdir>/<application-sample>/ and plots to <plotdir>/<application-sample>/

To run reweighting do it via configuration file:

run-reweight --config <path_to_config.yaml>

or passing all options to the command line:

run-reweight --path-data <path_to_data.root> \
             --path-mc <path_to_mc.root> \
             --training-vars <variable_list> \
             --monitoring-vars <monitoring_variable_list> \
             --sample <sample> \
             --n_trials <optuna_tests> \
             --test_size <test_sample_size> \
             --weightsdir <weights_directory>

To apply saved weights to an MC sample with configuration file:

apply-weights --config <path_to_config.yaml>

or passing all options to the command line:

apply-weights --path-mc <path_to_mc.root> \
              --vars <variable_list> \
              --training-sample <training_sample> \
              --application-sample <application_sample> \
              --method <method_for_reweighter> \
              --monitoring-vars <monitoring_variable_list> \
              --output-path <output_file.root> \
              --weightsdir <weights_directory>

Options

Each option below shows the CLI flag and, in parentheses, the equivalent YAML key. CLI values always take precedence over YAML values. Where multiple YAML keys are listed for the same option, all are accepted (the first one found wins).

For the reweighting (run-reweight)

General:

  • --config <file>: YAML configuration file
  • --dry-run: Validate config and print effective settings without running
  • --verbosity <1-4>: Logging level (default: 1)

Inputs — MC:

  • --path-mc <paths> (input.mc.path): MC ROOT file path(s); accepts multiple values
  • --tree-mc <name> (input.mc.tree): MC TTree name (default: DecayTree)
  • --mcweights-name <branch> (input.mc.mcweights_name, aliases: input.mc.weights_name, input.mc.weights_branch): MC input weight branch; leave unset if no prior weights exist
  • --mcweights-tree <name> (input.mc.mcweights_tree): Separate MC tree to read the mcweights branch from
  • --mc-label <label> (input.mc.label): MC sample label used in plots (default: MC)

Inputs — Data:

  • --path-data <paths> (input.data.path): Data ROOT file path(s); accepts multiple values
  • --tree-data <name> (input.data.tree): Data TTree name (default: DecayTree)
  • --sweights-name <branch> (input.data.sweights_name, alias: input.data.sweights_branch): sWeights branch in the data tree (default: sweight_sig)
  • --sweights-tree <name> (input.data.sweights_tree): Separate data tree to read the sweights branch from
  • --data-label <label> (input.data.label): Data sample label used in plots (default: Data)
  • --path-xlabels <file> (input.path_xlabels, aliases: input.xlabel_path, input.path_xlabel): YAML file mapping branch names to plot axis labels

Variables:

  • --training-vars <vars> (variables.training_vars): Space-separated list of variables used for reweighting training
  • --monitoring-vars <vars> (variables.monitoring_vars): Space-separated list of variables plotted for monitoring only (not used for training)

Reweighting:

  • --sample <name> (reweighting.sample): Sample tag; controls the output subdirectory name (default: bd_jpsikst_ee)
  • --methods <methods> (reweighting.methods): One or more reweighting methods — GB, ONNXGB, Folding, ONNXFolding, XGB, XGBFolding, NN, NNFolding, Bins
  • --transform <t> (reweighting.transform): Feature transform applied before training — quantile, yeo-johnson, signed-log, or scaler
  • --n_trials <n> (reweighting.n_trials): Optuna trials for hyperparameter search (default: 10; set to 0 to disable)
  • --test_size <f> (reweighting.test_size): Test split fraction (default: 0.3)
  • --n_folds <n> (reweighting.n_folds): Number of folds for folding reweighters (default: 10)
  • --n_bins <n> (reweighting.n_bins): Bin count for the Bins method (default: 10)
  • --n_neighs <n> (reweighting.n_neighs): Neighbor count for Bins smoothing (default: 3)
  • --reweight-validation-fraction <f> (reweighting.reweight_validation_fraction): Validation split fraction for iterative ONNX reweighters (default: 0.2)
  • --reweight-early-stopping-rounds <n> (reweighting.reweight_early_stopping_rounds): Early-stopping patience for iterative ONNX reweighters — number of validation checks without improvement before stopping (default: 5)
  • --reweight-metric-every <n> (reweighting.reweight_metric_every): Evaluate the ONNX validation metric every N stages (default: 1)
  • --clip-weights (reweighting.clip_weights, alias: reweighting.clip_weight): Clip extreme predicted weights at the 99th percentile for GB, ONNXGB, Bins, GBFolding, and ONNXFolding
  • --folding-aggregation <strategy> (reweighting.folding_aggregation): Aggregation strategy for ONNX folding — weighted_geometric (default), geometric, or median
  • --shap (reweighting.shap): Compute SHAP feature-importance values after training

Output:

  • --weightsdir <dir> (output.weightsdir): Root directory for training artifacts; a <sample>/ subdirectory is created automatically. Falls back to the MCREWEIGHTS_DATA_ROOT environment variable
  • --plotdir <dir> (output.plotdir): Root directory for plots; a <sample>/ subdirectory is created automatically (default: plots)

For the application of the weights (apply-weights)

General:

  • --config <file>: YAML configuration file
  • --dry-run: Validate config and print effective settings without running
  • --verbosity <1-4>: Logging level (default: 1)

Inputs — MC:

  • --path-mc <paths> (input.mc.path): MC ROOT file path(s); accepts multiple values
  • --tree-mc <name> (input.mc.tree): MC TTree name (default: DecayTree)
  • --mcweights-name <branch> (input.mc.mcweights_name, aliases: input.mc.weights_name, input.mc.weights_branch): Existing MC input weight branch
  • --mcweights-tree <name> (input.mc.mcweights_tree): Separate MC tree to read the mcweights branch from

Inputs — Data (optional, used for comparison plots only):

  • --path-data <paths> (input.data.path): Data ROOT file path(s); accepts multiple values
  • --tree-data <name> (input.data.tree): Data TTree name (default: DecayTree)
  • --sweights-name <branch> (input.data.sweights_name, alias: input.data.sweights_branch): sWeights branch in the data tree (default: sweight_sig)
  • --sweights-tree <name> (input.data.sweights_tree): Separate data tree to read the sweights branch from
  • --path-xlabels <file> (input.path_xlabels, aliases: input.xlabel_path, input.path_xlabel): YAML file mapping branch names to plot axis labels

Variables:

  • --vars <vars> (variables.application_vars, alias: variables.vars): Space-separated list of variables to apply the reweighter on
  • --training-vars <vars> (variables.training_vars): Variables the saved model was trained on (must match training)
  • --monitoring-vars <vars> (variables.monitoring_vars): Variables plotted for monitoring only

Reweighting:

  • --method <method> (reweighting.method): Method to apply — GB, Folding, ONNXGB, ONNXFolding, XGB, XGBFolding, NN, NNFolding, Bins (default: XGB)
  • --training-sample <name> (reweighting.training_sample): Tag of the sample used during training; determines the directory from which the model is loaded (default: bd_jpsikst_ee)
  • --application-sample <name> (reweighting.application_sample): Tag for this application run; controls the output subdirectory (default: bd_jpsikst_ee)
  • --weightsdir <dir> (reweighting.weightsdir): Root directory containing trained artifacts; <training-sample>/ is read and <application-sample>/ is written automatically. Falls back to MCREWEIGHTS_DATA_ROOT
  • --plotdir <dir> (reweighting.plotdir): Root directory for plots; <application-sample>/ is created automatically (default: plots)

Output:

  • --output-path <file> (output.output_path, alias: output.path): Output ROOT file path
  • --output-ntuple <type> (output.output_ntuple, alias: output.ntuple): Output ntuple type (default: TTree)
  • --output-tree <name> (output.output_tree, alias: output.tree): Output tree name (default: DecayTree)
  • --weights-name <branch> (output.weights_name, alias: output.weights_branch): Name of the weights branch written to the output file (default: weights)

Example

Reweighting:

pixi run run-reweight \
  --config tests_run/run_reweighting_config.yaml \
  --verbosity 2

Application of the weights:

pixi run apply-weights \
  --config tests_run/apply_weights_config.yaml \
  --method XGB \
  --output-path test_applied_weights.root

Documentation build:

pixi run -e docs build-docs

The generated HTML documentation is written to docs/_build/html/.

Contact

For questions, please contact the repository maintainer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcreweight-1.0.5.tar.gz (37.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcreweight-1.0.5-py3-none-any.whl (82.8 kB view details)

Uploaded Python 3

File details

Details for the file mcreweight-1.0.5.tar.gz.

File metadata

  • Download URL: mcreweight-1.0.5.tar.gz
  • Upload date:
  • Size: 37.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for mcreweight-1.0.5.tar.gz
Algorithm Hash digest
SHA256 2e9f1a85fb13ba02c6f65519b4d5e5ad546886b0ed436b9e0aa6a59ba4f53511
MD5 d03a42e8f991788bcac48788d56223b7
BLAKE2b-256 b689f11119383571b020811bead2c9c971567b9eae2d90623351a622cf584cb6

See more details on using hashes here.

File details

Details for the file mcreweight-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: mcreweight-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for mcreweight-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 199f3fa5d9869b65892be03dad1df6c7a3a4663e684a701e2b7f9d7e8721e675
MD5 3b36c1468790624d28a47bfc8138f730
BLAKE2b-256 837f12b271e56b8d67af7385a735d5659acba2c9199ec3625302915256789d7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page