Skip to main content

Some tools for SaS derivation from HDNA files reader to plotting functions

Project description

SaS utils

Package to convert file to IJazZ format, combine correctionlib files and apply correctionlib to parquet. To use the full workflow, use law_ijazz.

Install the package

Create conda env

conda create -n ijazz python=3.9
conda activate ijazz

Install package in editable mode

Clone the repo

git clone https://gitlab.cern.ch/pgaigne/sas_utils
cd sas_utils

Install package in editable mode

pip install -e .

Install from pypi

pip install cms-sas-utils

HiggsDNA Reader

This script, reader_higgsdna.py, is designed to read and convert HiggsDNA parquet files into the ijazz_2p0 format. Below is a brief description of its main functions:

Parameters:

  • data_dict (dict): dictionary with the data information:
    • dir (str): directory of the data parquet files
    • luminosity (float): luminosity of the data taking (optionnal)
  • mc_dict (dict): dictionary with the MC information:
    • dir (str): directory of the MC parquet files
    • name (str): name of the MC sample (optionnal)
    • XS (float): cross section of the MC sample (optionnal)
  • dir_out (str, optional): Output directory. Defaults to None.
  • stem_out (str, optional): Stem of the output file (it will be completed with different option). Defaults to None.
  • is_ele (bool, optional): Use the GSF electron energy. Defaults to False.
  • corrlib_scale (dict, optional): correction lib to correct the energy scale in data. Defaults to None.
  • corrlib_smear (dict, optional): correction lib to smear the MC. Defaults to None.
  • remove_HDNA_SaS (bool, optional): Remove the HDNA SaS correction. Defaults to False.
  • add_vars (List, optional): List of additional variables to add. Defaults to None.
  • charge (int, optional): Charge selection: -1 for opposite charge, 1 for same charge and 0 for no selection. Defaults to -1.
  • selection (str, optional): Additional selection to apply. Defaults to None.
  • do_normalisation (bool, optional): Apply the normalisation. Defaults to False.
  • reweight_selection (str, optional): Selection to apply to compute the reweighting. Defaults to None.
  • pileup_systematics_reweighting (bool, optional): Use the pileup systematics from corrlib or from HDNA. Defaults to False.
  • reset_weight (bool, optional): Reset the weight to weight_central=1 and weight=genWeight. Defaults to False.
  • corrlib_pileup_reweighting (str, optional): Use a correction file for pileup reweighting, should be path_to_json(str):correction_name(str). Defaults to None.
  • nPV_pileup_reweighting (bool, optional): Use the nPV pileup reweighting. Defaults to False.
  • rho_pileup_reweighting (bool, optional): Use the fixedGridRhoAll pileup reweighting. Defaults to False.
  • do_reweight (bool, optional): Apply the Z-pt and ScEta reweighting. Defaults to True.
  • subyear (str, optional): Subyear to add to the data. Used for luminosity (if not provided) and to add a tag is_{subyear} on the subyear samples. Defaults to None.
  • subyear_list (list, optional): List of subyears tags to add. Defaults to None.
  • backgrounds (list, optional): List of background dict. Defaults to []. Each dict should contain:
    • name (str): name of the background
    • dir (str): directory of the background parquet files
    • XS (float): cross section of the background sample (optionnal)
  • year (str, optional): Year of the data taking. Defaults to ''.
  • save_dt (bool, optional): Save the data. Defaults to True.
  • save_mc (bool, optional): Save the MC. Defaults to True.

Normalisation

Normalisation can be applied on NLO weights using cross-section (XS), luminosity (Lumi) and sum_genw_presel values if do_normalisation or using backgrounds.

MC weights

  • HiggsDNA: 3 weights are saved:

    • genWeight: NLO weights from the generator
    • weight = genWeight * weight_central: NLO weights + reweight from HDNA
    • weight_central: reweight from HDNA ie Pileup,..

    And 2 extra if Pileup systematics:

    • weight_PileupUp
    • weight_PileupDown
  • Output of the reader:

    • genWeight: NLO weights from the generator.
    • weight = genWeight * weight_central * norm * RW: NLO weights with HDNA and reader reweighting and normalization.
    • genWeight_normed = genWeight * norm: NLO weights from the generator normalized to the XS and luminosity.
    • weight_central = weight_central * RW: LO weights with HDNA and reader reweighting.

    And extra weight if Pileup systematics:

    • weight_PileupUp
    • weight_PileupDown
    • weight_central_PileupUp
    • weight_central_PileupDown

Usage Example

sas_reader_higgsdna config/cms/reader_higgsdna_example.yaml

This script is essential for converting and normalizing HiggsDNA data for further analysis in the ijazz_2p0 framework using this example reader_higgsdna_example.yaml.

Time equalisation

RunTimeStep0

Function saving the run splitting to a csv file (this is step0 of time equalisation)

Parameters:

  • file_dt (str, optional): input file for data (can be inferred from reader if None). Defaults to None.
  • n_split (float, optional): number of event in each subsample. Defaults to 5e4.
  • d_fsplit (float, optional): tolerance w/r to n_plit (in percent). Defaults to 0.2.
  • dir_results (str, optional): directory to save the results. Defaults to '.'.
  • name_run (str, optional): name of the run variable in file_dt. Defaults to 'run'.
  • cfg_sas (dict, optional): dictionnary with the sas config. Defaults to None.
sas_time_equalisation_step0 config/cms/reader_higgsdna_example.yaml

RunTimeStep1

Function fitting the scale in each run range (step 1 of time equalisation)

Parameters:

  • file_dt (str, optional): input file for data (can be inferred from reader if None). Defaults to None.
  • file_mc (str, optional): input file for MC (can be inferred from reader if None). Defaults to None.
  • dir_results (str, optional): directory to save the results. Defaults to '.'.
  • dset_id (str, optional): dataset id. Defaults to 'Unknown'.
  • name_run (str, optional): description. Defaults to 'run'.
  • cfg_sas (dict, optional): dictionnary of the sas config. Defaults to None.
  • irun (int, optional): first run to fit. Defaults to 0.
  • nrun (int, optional): number of runs to fit. Defaults to -1.
  • columns (list, optional): list of columns to read from the data file. Defaults to None (automatic).
  • name_mll (str, optional): name of the dilepton mass. Defaults to 'mass'.
sas_time_equalisation_step1 config/cms/reader_higgsdna_example.yaml --irun 0 --nrun -1

Parallelization can be done be specifying the starting run number irun and the number of runs to be done nrun per task:

NRUN=5
for (( i=0; i<total_run; i+=NRUN )); do
    sas_time_equalisation_step1 config/cms/reader_higgsdna_example.yaml --irun $i --nrun $NRUN
done

RunTimeStep2

Aggregate the result from the run dependent scale fit into a single correction lib file and do plots (this is step2 of time equalisation)

Parameters:

  • file_dt (str, optional): input file for data (can be inferred from reader if None). Defaults to None.
  • dir_results (str, optional): directory to save the results. Defaults to '.'.
  • dset_id (str, optional): dataset id. Defaults to 'Unknown'.
  • cset_version (int, optional): version of the set of corrections. Defaults to 1.
  • name_run (str, optional): name of the run variable in file_dt. Defaults to 'run'.
  • name_eta (str, optional): name of the eta variable in file_dt. Defaults to 'ScEta'.
  • correct_data (bool, optional): apply the scale to data. Defaults to True.
  • resp_range (tuple, optional): y-range for resp plotting. Defaults to (0.92, 1.05).
  • reso_range (tuple, optional): y-range for reso plotting. Defaults to (0, 0.09).
  • run_split (list, optional): list with the starting run number of each eras. Defaults to None.
  • eras (list, optional): list of each eras name. Defaults to None.
sas_time_equalisation_step2 config/cms/reader_higgsdna_example.yaml

Combine corrlib

Combine different corrlib correction files, some files could use always the nominal scale for variations (if only one variations should be considered to avoid double counting).

Parameters:

  • cset_files (List[Union[str, Path]]): list of corrlib files
  • icset_fix_scale (Union[List,Tuple]): list of corrlib for which the nominal scale only should be use. [-1] to keep all the variations.
  • dir_results (Union[str, Path]): directory
  • dset_name (str, optional): identfier of the datase. Defaults to 'DSET'.
  • cset_version (int, optional): version of the set of corrections. Defaults to 1.
  • include_random (bool, optional): include the random generator. Defaults to True.

Usage Example

Combining the 6 steps:

file_corr0=results/2022/TimeDep/EGMScalesSmearing_2022preEE.v1.json.gz
file_corr1=results/2022/EtaR9/EGMScalesSmearing_2022preEE.v1.json.gz
file_corr2=results/2022/FineEtaR9/EGMScalesSmearing_2022.v1.json.gz
file_corr3=results/2022/PT/EGMScalesSmearing_2022.v1.json.gz
file_corr4=results/2022/Gain/EGMScalesSmearing_2022.v1.json.gz
file_corr5=results/2022/PTsplit/EGMScalesSmearing_2022preEE.v1.json.gz
sas_corrlib_combine $file_corr0 $file_corr1 $file_corr2 $file_corr3 $file_corr4 $file_corr5 -i 1 2 3 4  -v 1 -o results/2022 -d Pho_2022preEE

Create EGMScalesSmearing_Pho_2022preEE.v1.json.gz output file. Including a compound correction for scales EGMScale_Compound_Pho_2022preEE and for each step, the scale correction :EGMScale_Pho{step_name}_2022preEE and smearing correction:EGMSmearAndSyst_Pho{step_name}_2022preEE.

We use -i 1 2 3 4 because the systematics are computed in the last step (step5) and the time dependent correction does not include escale then we fix the escale for the file 1, 2, 3 and 4. Finally, we have scale = scale0 * scale1 * scale2 * scale3 * scale4 * scale5 and escale = escale5.

Second example, to use time equalisation and only one step, we don't want to fix any scale -i -1:

sas_corrlib_combine $file_corr0 $file_corr1  -i -1  -v 1 -o results/2022 -d Pho_2022preEE

Correct file

Apply Scale and Smearing to parquet files using this example apply_et_dependent_SaS.yaml. Where the compound scale is applied on data and the MC is smeared using the smearing compute at the last step.

For IJazZ ET-dependent corrections (computed at the step before):

sas_file_corrector config/cms/apply_et_dependent_SaS.yaml --syst

For EGM standard corrections using apply_standard_SaS.yaml:

sas_file_corrector config/cms/apply_standard_SaS.yaml --syst 

Validation plots

mll validation plots

Plot the Z-mass peak in different categories defined in a config yaml file. See an example validation_plots.yaml. The inputs parquets can be defined in a separate yaml file (first example) or not (second example).

sas_dyll_valid_plot config/cms/apply_standard_SaS.yaml --cfg config/cms/validation_plots.yaml --syst

Equivalent of running this two commands:

sas_file_corrector config/cms/apply_standard_SaS.yaml --syst
sas_dyll_valid_plot config/cms/validation_plots.yaml

Kinematics plots

Plot kinematics variables defined in a config yaml file. See an example kin_plots.yaml.

sas_dyll_kin_plot config/cms/apply_standard_SaS.yaml --cfg config/cms/kin_plots.yaml --syst

Equivalent of running this two commands:

sas_file_corrector config/cms/apply_standard_SaS.yaml --syst
sas_dyll_kin_plot config/cms/kin_plots.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cms_sas_utils-0.3.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cms_sas_utils-0.3-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file cms_sas_utils-0.3.tar.gz.

File metadata

  • Download URL: cms_sas_utils-0.3.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for cms_sas_utils-0.3.tar.gz
Algorithm Hash digest
SHA256 84aac7c961034a742553d737a9251fa3193ef2f3da267706ece8164c4b68d687
MD5 7eb5735be6ce8850177f55a462b5a0c6
BLAKE2b-256 3a2c02bde7a8b5d08f1300187b3ab1012eebf7d596a2e166137d04b7cd66787d

See more details on using hashes here.

File details

Details for the file cms_sas_utils-0.3-py3-none-any.whl.

File metadata

  • Download URL: cms_sas_utils-0.3-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for cms_sas_utils-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a76be971cefdcb9853f97112e4c1712c58ceec91d835bee3a57db51e4e3b5815
MD5 e64d5708ddbe40bbca76d3bd471f9b64
BLAKE2b-256 0e4e847babb85fe55f5550dd4938d59b8f328b442827c1cc9944abe4f9c11231

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page