Skip to main content

A package for reweighting MC samples to match data

Project description

mcreweight

mcreweight is a python library to perform Monte Carlo event reweighting based on multiplicity and kinematic variables. The tool is using GBReweighter, a classifier-based method implemented in hep_ml package, and it supports automated hyperparameter tuning with Optuna. A folding approach over the reweighter is also applied, and performances are compared with the ones from the bins reweighting.

[!WARNING] Bins reweighting works fine for one or two dimensional histograms, but it is unstable and inaccurate for higher dimenstions

Requirements

  • Python 3.8+
  • Required packages listed in requirements.txt

Setup

If you run in a lb-conda environment, as

lb-conda default

consider exporting the PATH to run scripts/run_reweighting.py and scripts/apply_weights.py with commands run-reweight and apply-weights. In this case, use

export PATH=$PATH:/path/to/.local/bin

Installation

From PyPl

pip install mcreweight

From Gitlab

git clone https://github.com/tfulghes/mcreweight.git
cd mcreweight
pip install -e .

You may need the following dependencies:

pip install -r requirements.txt

Usage

To run the reweighting:

run-reweight --path_data <path_to_data.root> \
    	     --path_mc <path_to_mc.root> \
    	     --vars <variable_list> \
    	     --monitoring_vars <monitoring_variable_list> \
    	     --sample <sample> \
    	     --trials <optuna_tests> \
    	     --test_size <test_sample_size>

To apply the weights to the signal MC:

apply-weights --path_mc <path_to_mc.root> \
    	      --vars <variable_list> \
    	      --training_sample <training_sample> \
    	      --application_sample <application_sample> \
    	      --method <method_for_reweighter> \
    	      --monitoring_vars <monitoring_variable_list> \
    	      --output_path <output_file.root>

Options

For the reweighting (run-reweight):

Input files:

  • --path_data: Path to the data control sample (required)
  • --tree_data: Name of the tree in the data control sample (default: "DecayTree")
  • --path_mc: Path to the MC control sample (required)
  • --tree_mc: Name of the tree in the MC control sample (default: "DecayTree")
  • --mcweights_name: Name of the branch for weights in the MC sample (default: None)
  • --sweights_name: Name of the sweights column in the data (default: "sweight_sig")
  • --mc_label: Label for the MC sample (default: "MC")
  • --data_label: Label for the data sample (default: "Data")

Variables:

  • --vars: List of variables to use for reweighting (default: ["B_DTF_Jpsi_P", "B_DTF_Jpsi_PT", "nLongTracks", "nPVs"])
  • --monitoring_vars: List of variables to plot (default: ["B_ETA", "nFTClusters", "nVPClusters", "nEcalClusters"])

Reweighter configuration:

  • --sample: Sample name for the dataset (default: "bd_jpsikst_ee")
  • --trials: Number of trials for the gradient boosting reweighting (default: 10)
  • --test_size: Proportion of the dataset to include in the test split (default: 0.3)
  • --n_folds: Number of folds for k-folding reweighting (default: 4)
  • --n_bins: Number of bins for binning reweighting (default: 20)
  • --n_neighs: Number of nearest neighbors for binning reweighting (default: 3)

Output:

  • --weightsdir: Directory to save weights (default: "weights")
  • --plotdir: Directory to save plots (default: "plots")

Additional options can be found by running:

run-reweight --help

For the application of the weights (apply-weights):

Input files:

  • --path_mc: Path to the MC signal sample (required)
  • --tree_mc: Name of the tree in the MC signal sample (default: "DecayTree")
  • --mcweights_name: Name of the branch for weights in the output ROOT file (default: None)
  • --path_data: Path to the data sample for comparison (default: None)
  • --tree_data: Name of the tree in the data sample (default: "DecayTree")
  • --sweights_name: Name of the sweights column in the data (default: "sweight_sig")

Variables:

  • --vars: List of variables to use for reweighting (default: ["B_DTF_Jpsi_P", "B_DTF_Jpsi_PT", "nLongTracks", "nPVs"])
  • --training_vars: List of variables used for training (default: ["B_DTF_Jpsi_P", "B_DTF_Jpsi_PT", "nLongTracks", "nPVs"])
  • --monitoring_vars: List of variables to plot (default: ["B_ETA", "nFTClusters", "nVPClusters", "nEcalClusters"])

Configuration:

  • --training_sample: Sample name for the dataset (default: "bd_jpsikst_ee")
  • --application_sample: Sample name for the application of weights (default: "bd_jpsikst_ee")
  • --method: Method to apply weights (choices: "gbreweighter", "kfolding", "binning", default: "gbreweighter")
  • --weightsdir: Directory to save weights (default: "weights")
  • --plotdir: Directory to save plots (default: "plots")

Output:

  • --output_path: Path to save the output ROOT file (required)
  • --output_tree: Name of the tree in the output ROOT file (default: "DecayTree")

Additional options can be found by running:

apply-weights --help

Example

Reweighting:

run-reweight --path_data data/control_sample_tuple.root \
    	     --path_mc mc/control_sample_tuple.root \
    	     --vars B_DTF_Jpsi_P B_DTF_Jpsi_PT nLongTracks nPVs \
    	     --monitoring_vars B_ETA nFTClusters nVPClusters nEcalClusters \
    	     --sample bd_jpsikst_ee \
    	     --trials 25 \
    	     --test_size 0.3 

Application of the weights:

apply-weights --path_mc mc/signal_tuple.root \
    	      --vars B_P B_PT nLongTracks nPVs  \
    	      --training_vars B_DTF_Jpsi_P B_DTF_Jpsi_PT nLongTracks nPVs  \
    	      --training_sample bd_jpsikst_ee \
    	      --application_sample bd_jpsikst_ee \
    	      --method gbreweighter \
    	      --monitoring_vars B_ETA nFTClusters nVPClusters nEcalClusters \
    	      --output_path mc/signal_tuple_reweighted.root

Contact

For questions, please contact the repository maintainer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcreweight-0.1.6.tar.gz (51.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcreweight-0.1.6-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file mcreweight-0.1.6.tar.gz.

File metadata

  • Download URL: mcreweight-0.1.6.tar.gz
  • Upload date:
  • Size: 51.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcreweight-0.1.6.tar.gz
Algorithm Hash digest
SHA256 4bfb7f7ec826e6cf59d28e81b0e27e352bc2f4fd1c338250c623596a573857a7
MD5 e12a69e6ac26e7ec92886e91ce1246f9
BLAKE2b-256 92603eb095eaf3ee16f82b75a478e81095f0b2214c1b2d551ff20315e4486f15

See more details on using hashes here.

File details

Details for the file mcreweight-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: mcreweight-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcreweight-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5281fb97b8ee0a10a16ca747adb5aabbe1409bf312b891345cec11819a5c7edb
MD5 b31589689385a05855df3551d717c562
BLAKE2b-256 1fdbafe17aaecab5c24fd3760f41cf38058a4360ec20faecde0fc29c0a2d071f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page