Some tools for SaS derivation from HDNA files reader to plotting functions
Project description
SaS utils
Package to convert file to IJazZ format, combine correctionlib files and apply correctionlib to parquet. To use the full workflow, use law_ijazz.
Install the package
Create conda env
conda create -n ijazz python=3.9
conda activate ijazz
Install package in editable mode
Clone the repo
git clone https://gitlab.cern.ch/pgaigne/sas_utils
cd sas_utils
Install package in editable mode
pip install -e .
Install from pypi
pip install cms-sas-utils
HiggsDNA Reader
This script, reader_higgsdna.py, is designed to read and convert HiggsDNA parquet files into the ijazz_2p0 format. Below is a brief description of its main functions:
Parameters:
data_dict(dict): dictionary with the data information:- dir (str): directory of the data parquet files
- luminosity (float): luminosity of the data taking (optionnal)
mc_dict(dict): dictionary with the MC information:- dir (str): directory of the MC parquet files
- name (str): name of the MC sample (optionnal)
- XS (float): cross section of the MC sample (optionnal)
dir_out(str, optional): Output directory. Defaults to None.stem_out(str, optional): Stem of the output file (it will be completed with different option). Defaults to None.is_ele(bool, optional): Use the GSF electron energy. Defaults to False.corrlib_scale(dict, optional): correction lib to correct the energy scale in data. Defaults to None.corrlib_smear(dict, optional): correction lib to smear the MC. Defaults to None.remove_HDNA_SaS(bool, optional): Remove the HDNA SaS correction. Defaults to False.add_vars(List, optional): List of additional variables to add. Defaults to None.charge(int, optional): Charge selection: -1 for opposite charge, 1 for same charge and 0 for no selection. Defaults to -1.selection(str, optional): Additional selection to apply. Defaults to None.do_normalisation(bool, optional): Apply the normalisation. Defaults to False.reweight_selection(str, optional): Selection to apply to compute the reweighting. Defaults to None.pileup_systematics_reweighting(bool, optional): Use the pileup systematics from corrlib or from HDNA. Defaults to False.reset_weight(bool, optional): Reset the weight toweight_central=1 andweight=genWeight. Defaults to False.corrlib_pileup_reweighting(str, optional): Use a correction file for pileup reweighting, should bepath_to_json(str):correction_name(str). Defaults to None.nPV_pileup_reweighting(bool, optional): Use the nPV pileup reweighting. Defaults to False.rho_pileup_reweighting(bool, optional): Use the fixedGridRhoAll pileup reweighting. Defaults to False.do_reweight(bool, optional): Apply the Z-pt and ScEta reweighting. Defaults to True.subyear(str, optional): Subyear to add to the data. Used for luminosity (if not provided) and to add a tagis_{subyear}on the subyear samples. Defaults to None.subyear_list(list, optional): List of subyears tags to add. Defaults to None.backgrounds(list, optional): List of background dict. Defaults to []. Each dict should contain:- name (str): name of the background
- dir (str): directory of the background parquet files
- XS (float): cross section of the background sample (optionnal)
year(str, optional): Year of the data taking. Defaults to ''.save_dt(bool, optional): Save the data. Defaults to True.save_mc(bool, optional): Save the MC. Defaults to True.
Normalisation
Normalisation can be applied on NLO weights using cross-section (XS), luminosity (Lumi) and sum_genw_presel values if do_normalisation or using backgrounds.
MC weights
-
HiggsDNA: 3 weights are saved:
genWeight: NLO weights from the generatorweight=genWeight * weight_central: NLO weights + reweight from HDNAweight_central: reweight from HDNA iePileup,..
And 2 extra if Pileup systematics:
weight_PileupUpweight_PileupDown
-
Output of the reader:
genWeight: NLO weights from the generator.weight=genWeight * weight_central * norm * RW: NLO weights with HDNA and reader reweighting and normalization.genWeight_normed=genWeight * norm: NLO weights from the generator normalized to the XS and luminosity.weight_central=weight_central * RW: LO weights with HDNA and reader reweighting.
And extra weight if Pileup systematics:
weight_PileupUpweight_PileupDownweight_central_PileupUpweight_central_PileupDown
Usage Example
sas_reader_higgsdna config/cms/reader_higgsdna_example.yaml
This script is essential for converting and normalizing HiggsDNA data for further analysis in the ijazz_2p0 framework using this example reader_higgsdna_example.yaml.
Time equalisation
RunTimeStep0
Function saving the run splitting to a csv file (this is step0 of time equalisation)
Parameters:
file_dt(str, optional): input file for data (can be inferred from reader if None). Defaults to None.n_split(float, optional): number of event in each subsample. Defaults to 5e4.d_fsplit(float, optional): tolerance w/r to n_plit (in percent). Defaults to 0.2.dir_results(str, optional): directory to save the results. Defaults to '.'.name_run(str, optional): name of the run variable in file_dt. Defaults to 'run'.cfg_sas(dict, optional): dictionnary with the sas config. Defaults to None.
sas_time_equalisation_step0 config/cms/reader_higgsdna_example.yaml
RunTimeStep1
Function fitting the scale in each run range (step 1 of time equalisation)
Parameters:
file_dt(str, optional): input file for data (can be inferred from reader if None). Defaults to None.file_mc(str, optional): input file for MC (can be inferred from reader if None). Defaults to None.dir_results(str, optional): directory to save the results. Defaults to '.'.dset_id(str, optional): dataset id. Defaults to 'Unknown'.name_run(str, optional): description. Defaults to 'run'.cfg_sas(dict, optional): dictionnary of the sas config. Defaults to None.irun(int, optional): first run to fit. Defaults to 0.nrun(int, optional): number of runs to fit. Defaults to -1.columns(list, optional): list of columns to read from the data file. Defaults to None (automatic).name_mll(str, optional): name of the dilepton mass. Defaults to 'mass'.
sas_time_equalisation_step1 config/cms/reader_higgsdna_example.yaml --irun 0 --nrun -1
Parallelization can be done be specifying the starting run number irun and the number of runs to be done nrun per task:
NRUN=5
for (( i=0; i<total_run; i+=NRUN )); do
sas_time_equalisation_step1 config/cms/reader_higgsdna_example.yaml --irun $i --nrun $NRUN
done
RunTimeStep2
Aggregate the result from the run dependent scale fit into a single correction lib file and do plots (this is step2 of time equalisation)
Parameters:
file_dt(str, optional): input file for data (can be inferred from reader if None). Defaults to None.dir_results(str, optional): directory to save the results. Defaults to '.'.dset_id(str, optional): dataset id. Defaults to 'Unknown'.cset_version(int, optional): version of the set of corrections. Defaults to 1.name_run(str, optional): name of the run variable in file_dt. Defaults to 'run'.name_eta(str, optional): name of the eta variable in file_dt. Defaults to 'ScEta'.correct_data(bool, optional): apply the scale to data. Defaults to True.resp_range(tuple, optional): y-range for resp plotting. Defaults to (0.92, 1.05).reso_range(tuple, optional): y-range for reso plotting. Defaults to (0, 0.09).run_split(list, optional): list with the starting run number of each eras. Defaults to None.eras(list, optional): list of each eras name. Defaults to None.
sas_time_equalisation_step2 config/cms/reader_higgsdna_example.yaml
Combine corrlib
Combine different corrlib correction files, some files could use always the nominal scale for variations (if only one variations should be considered to avoid double counting).
Parameters:
cset_files(List[Union[str, Path]]): list of corrlib filesicset_fix_scale(Union[List,Tuple]): list of corrlib for which the nominal scale only should be use. [-1] to keep all the variations.dir_results(Union[str, Path]): directorydset_name(str, optional): identfier of the datase. Defaults to 'DSET'.cset_version(int, optional): version of the set of corrections. Defaults to 1.include_random(bool, optional): include the random generator. Defaults to True.
Usage Example
Combining the 6 steps:
file_corr0=results/2022/TimeDep/EGMScalesSmearing_2022preEE.v1.json.gz
file_corr1=results/2022/EtaR9/EGMScalesSmearing_2022preEE.v1.json.gz
file_corr2=results/2022/FineEtaR9/EGMScalesSmearing_2022.v1.json.gz
file_corr3=results/2022/PT/EGMScalesSmearing_2022.v1.json.gz
file_corr4=results/2022/Gain/EGMScalesSmearing_2022.v1.json.gz
file_corr5=results/2022/PTsplit/EGMScalesSmearing_2022preEE.v1.json.gz
sas_corrlib_combine $file_corr0 $file_corr1 $file_corr2 $file_corr3 $file_corr4 $file_corr5 -i 1 2 3 4 -v 1 -o results/2022 -d Pho_2022preEE
Create EGMScalesSmearing_Pho_2022preEE.v1.json.gz output file. Including a compound correction for scales EGMScale_Compound_Pho_2022preEE and for each step, the scale correction :EGMScale_Pho{step_name}_2022preEE and smearing correction:EGMSmearAndSyst_Pho{step_name}_2022preEE.
We use -i 1 2 3 4 because the systematics are computed in the last step (step5) and the time dependent correction does not include escale then we fix the escale for the file 1, 2, 3 and 4. Finally, we have scale = scale0 * scale1 * scale2 * scale3 * scale4 * scale5 and escale = escale5.
Second example, to use time equalisation and only one step, we don't want to fix any scale -i -1:
sas_corrlib_combine $file_corr0 $file_corr1 -i -1 -v 1 -o results/2022 -d Pho_2022preEE
Correct file
Apply Scale and Smearing to parquet files using this example apply_et_dependent_SaS.yaml. Where the compound scale is applied on data and the MC is smeared using the smearing compute at the last step.
For IJazZ ET-dependent corrections (computed at the step before):
sas_file_corrector config/cms/apply_et_dependent_SaS.yaml --syst
For EGM standard corrections using apply_standard_SaS.yaml:
sas_file_corrector config/cms/apply_standard_SaS.yaml --syst
Validation plots
mll validation plots
Plot the Z-mass peak in different categories defined in a config yaml file. See an example validation_plots.yaml. The inputs parquets can be defined in a separate yaml file (first example) or not (second example).
sas_dyll_valid_plot config/cms/apply_standard_SaS.yaml --cfg config/cms/validation_plots.yaml --syst
Equivalent of running this two commands:
sas_file_corrector config/cms/apply_standard_SaS.yaml --syst
sas_dyll_valid_plot config/cms/validation_plots.yaml
Kinematics plots
Plot kinematics variables defined in a config yaml file. See an example kin_plots.yaml.
sas_dyll_kin_plot config/cms/apply_standard_SaS.yaml --cfg config/cms/kin_plots.yaml --syst
Equivalent of running this two commands:
sas_file_corrector config/cms/apply_standard_SaS.yaml --syst
sas_dyll_kin_plot config/cms/kin_plots.yaml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cms_sas_utils-0.3.tar.gz.
File metadata
- Download URL: cms_sas_utils-0.3.tar.gz
- Upload date:
- Size: 38.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84aac7c961034a742553d737a9251fa3193ef2f3da267706ece8164c4b68d687
|
|
| MD5 |
7eb5735be6ce8850177f55a462b5a0c6
|
|
| BLAKE2b-256 |
3a2c02bde7a8b5d08f1300187b3ab1012eebf7d596a2e166137d04b7cd66787d
|
File details
Details for the file cms_sas_utils-0.3-py3-none-any.whl.
File metadata
- Download URL: cms_sas_utils-0.3-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a76be971cefdcb9853f97112e4c1712c58ceec91d835bee3a57db51e4e3b5815
|
|
| MD5 |
e64d5708ddbe40bbca76d3bd471f9b64
|
|
| BLAKE2b-256 |
0e4e847babb85fe55f5550dd4938d59b8f328b442827c1cc9944abe4f9c11231
|