AlphaTwirl + uproot for the Z inv. width analysis
Project description
Z invisible analysis
This code processes CMS event-based data and simulation stored in a flat ROOT.TTree
format (i.e. branches correspond to simple data types such as bool
, int
, float
, ... or an std::vector
of these data types). Typically, this is done on nanoAOD. The output is a dataframe(s) of similar data types (with the exclusion of vectors) either directly taken from the nanoAOD files or derived from these variables to create an analysis-level dataframe.
This is achieved by reading in nanoAOD files with uproot applying a set of modules to generate derived variables and storing these in a dataframe saved to disk. Yaml config files are passed to define the input data, modules and output.
Usage
Install with pip:
pip install zinv-analysis
or in editable mode to alter the code:
git clone git@github.com:shane-breeze/zinv-analysis.git
cd zinv-analysis
pip install -e .
Either run with the CLI
zinv_analysis.py --help
or the python API
import zinv
help(zinv.modules.analyse)
Layout
Interfaces
Interfaces to the underlying code is located in analyse.py and resume.py.
Scripts using these functions are found in zinv/scripts/.
Modules
A set of modules which create derived variables are found in zinv/modules/readers. These modules are applied to the data with the (alphatwirl)[https://github.com/alphatwirl/alphatwirl] package and contain a class (possibly) with the begin
, event
and end
methods.
The begin
method is run at the start of processing the data to initialise some required parameters. The EventTools module adds a register_function
method to the event
to allows functions to be cached for lazy-evaluation (e.g. the JEC variations function is not run if the JEC variations are not saved in the output).
The event
method is applied to each iteration over the input data. This corresponds to a chunk of events which are loaded into numpy arrays with uproot. Here the derived variables are evaluated. However, because of thee lazy-evaluation this is typically blank for most modules.
The end
method ia applied at the end of processing to clear up anything that needs to be cleared. If this is run in multiprocessing or batch processing mode then modules are serialised. Lambda functions are not serialisable and hence must be created with the begin
method and cleared in the end
method.
Output
A special module defines the output. Currently this is HDF5.py. Instead of creating derived variables, this module will evaluate the previously defined functions and store them into a .h5
file using pandas. The actual output is defined by yaml config.
Config
The yaml config is defined externally by the user and controls where the datasets are found, which modules are applied and the output into the dataframes. However, with this flexibility extra care must be taken so modules which depend on each other are defined and in the correct order. For example, if the JEC variations are saved by the HDF5
module, then the JECVariation
module must be included in the sequence before the output module.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file zinv-analysis-0.3.2.tar.gz
.
File metadata
- Download URL: zinv-analysis-0.3.2.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3baf17b0266864469a61dab97e77d9788d29128c79bbe8b07b8795350b4a11d1 |
|
MD5 | 12951c5269d72ee5316bd7d34e7678b4 |
|
BLAKE2b-256 | 81bc11657790ca53274390ba59b43d6def437745e6ad74d9efb3a7e3fcdef4f9 |
File details
Details for the file zinv_analysis-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: zinv_analysis-0.3.2-py3-none-any.whl
- Upload date:
- Size: 45.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23620887a24b91c4d2a27decd196b917a1773f7fe9e1fa2ad2240853a700764c |
|
MD5 | 467f63f9d13d11b321fa78a684c08013 |
|
BLAKE2b-256 | a65e72c16f48f644eff458e035e0ba80d92e1d25ab17a9e9a0a3ca8b49da9b2f |