fast-datacard

F.A.S.T. datacard creation package

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

fast-datacard

Free software: Apache Software License 2.0
Documentation: https://fast-datacard.readthedocs.io.

Overview

fast-datacard is a python packaged developed within the Faster Analysis Software Taskforce (FAST) collaboration. The main purpose of this package is to create datacards compatible with the HiggsCombine tool from data frames. The package will take categorical^{citation needed} data frames, e.g. as created by the alphatwirl package, and create the necessary ROOT and datacard outputs.

Features

convert categorical data frames (see examples/data/*.csv) into valid data to use in the HiggsCombine tool.

Usage

The usage is the following::

fast_datacard <yaml_config_file>

An example yaml config file is available: examples/datacards_config.yaml. The config file lists all the input event categories, regions, physics processes, dataframes, etc. A few things should be noted:

The existence of the general, regions, signals, backgrounds and systematics blocks is mandatory.
analysis_name, version, and dataset are just used for versioning.
The value of luminosity (float, in fb-1) is used to weight the signal and backgrounds content and error to the expected luminosity.
For each signal and background process named X, there should be a file in the path_to_dfs directory named X.csv (a whitespace-separated Pandas dataframe).
data_names_df should be equal to the process name used for data in the dataframe (Data in the example config file) and also should be the name of the .csv dataframe in path_to_dfs. data_names_dc will the name of the output data histogram and should be equal to data_obs as imposed by the HiggsCombine tool.
There has to be at least one signal and one background.
Backgrounds (but not signals, see below) can live only in specific region(s) (see example config file).
The systematics listed in the systematics block can have three types: lnN, lnU, and shape. The first two are normalization uncertainties and a value should be provided that corresponds to 1 + X, where X is the uncertainty one sigma level in percent (see example config file). For the shape type, no value is required as the shape itself encodes the uncertainty level. There is no need to specify Up/Down in the name of the uncertainty as this will be derived from the input dataframe (see below).
The systematics can apply only to a given set of signals and/or backgrounds, in which case the name of the process (identical to the one in the dataframe) should be specified. If the systematic applies to all backgrounds, backgrounds can be used instead of listing all the background processes (and the same is true for signals).

The configuration for running is also partly derived from the input dataframes, which formats should therefore follow a few rules:

The columns should be named:

process region category systematic variable variable_low variable_high content error

Where:

process is the name of the physics process, e.g. VBF, Ewk, etc.
region is the name of the region, e.g. Signal, ControlRegion1, etc.
category is the name of the event category, e.g. 2jet, highMass, etc. Each unique name will be considered as a different category.
systematic is the name of systematic shape variation that is applied to obtain the content of this row. E.g. if a process is characterized by two shape systematic uncertainties named syst1 and syst2, then the dataframe should contain 5 variations: nominal, syst1_Up, syst1_Down, syst2_Up, syst2_Down for each bin where this process exists.
variable is the name of variable that defines the x-values in the output histograms. It is not used by the code but is mainly there to keep track of the fit variables in different categories.
variable_low and variable_high define the binning along x in the output histograms used for the fit. Each unique set of (variable_low, variable_high) will be considered as a unique bin.
content is the yield for this specific (process, region, category, systematic, variable, variable_low, variable_high) bin.
error is the error assigned to the yield (please note it is not the square of the error! therefore for a Poisson experiment it should be sqrt(N).)

The use of region or category is optional in the sense that an analysis might contain only one region and one category; in this case, the value of each column needs to be filled by the same value for all rows.

The signal(s) process(es) should be defined in all categories and regions, even if the content is 0. In other words, if you’re looking for an exotics signal named bananas, the code assumes it will find a row with bananas ‘s content for each bin of the analysis (i.e. the code never makes the assumption that the signal cannot live in the control regions as well).
The data should be defined in all categories and regions, even if the content is 0. If data is not defined somewhere, the category/region shouldn’t even exist in the analysis.

The package will produce two sets of outputs:

Text datacards that summarize the physics processes, the yields, and meta-information about the analysis.
ROOT datacards that contain histrograms describing shapes that will be used in the fit.

Both serve as inputs to the HiggsCombine tool.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.5 (2018-04-29)

Fix typo

0.1.4 (2018-04-29)

Added error message to explain crash

0.1.3 (2018-04-05)

Easier handling of dataframe files

0.1.2 (2018-04-04)

Updated executable name and documentation

0.1.1 (2018-10-01)

added initial documentation

0.1.0 (2018-08-21)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.1.5

Apr 30, 2019

0.1.3

Apr 5, 2019

0.1.2

Apr 4, 2019

0.1.1

Oct 1, 2018

0.1.0

Sep 26, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast-datacard-0.1.5.tar.gz (17.3 kB view details)

Uploaded Apr 30, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fast_datacard-0.1.5-py2.py3-none-any.whl (12.2 kB view details)

Uploaded Apr 30, 2019 Python 2Python 3

File details

Details for the file fast-datacard-0.1.5.tar.gz.

File metadata

Download URL: fast-datacard-0.1.5.tar.gz
Upload date: Apr 30, 2019
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast-datacard-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`1de72663a4643fc00d44b368be5ee3870bd019aef9476540972a855be8105781`
MD5	`858f73125a0691898bc9408db9526311`
BLAKE2b-256	`68eb5e95b4d1c22cfe4799641a6339d82fa20e5401f4d4707c8d331d21b11b0d`

See more details on using hashes here.

File details

Details for the file fast_datacard-0.1.5-py2.py3-none-any.whl.

File metadata

Download URL: fast_datacard-0.1.5-py2.py3-none-any.whl
Upload date: Apr 30, 2019
Size: 12.2 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast_datacard-0.1.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`1396229c32cbfbff76a1dc7b1fb61caee18f661b7689b41cbcffffa65ebed4d7`
MD5	`da3061ca9a614af7d15b2fe04a4686ee`
BLAKE2b-256	`121cee14cbbee9299563beb9ce714980b2eb7f9ee81493cc1f2fea2ad8987847`

See more details on using hashes here.

fast-datacard 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fast-datacard

Overview

Features

Usage

Credits

History

0.1.5 (2018-04-29)

0.1.4 (2018-04-29)

0.1.3 (2018-04-05)

0.1.2 (2018-04-04)

0.1.1 (2018-10-01)

0.1.0 (2018-08-21)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes