ServiceX data management using a configuration file
Project description
ServiceX DataBinder
Release v0.1.7
ServiceX DataBinder is a Python package for making multiple ServiceX requests and managing ServiceX delivered data from a configuration file.
Installation
pip install servicex-databinder
Configuration file
The configuration file is a yaml file containing all the information. An example configuration file is shown below:
General:
ServiceXBackendName: uproot
OutputDirectory: /path/to/output
OutputFormat: parquet
Sample:
- Name: ttH
RucioDID: user.kchoi:user.kchoi.sampleA,
user.kchoi:user.kchoi.sampleB
Tree: nominal
FuncADL: "Select(lambda event: {'jet_e': event.jet_e, 'jet_pt': event.jet_pt})"
- Name: ttW
RucioDID: user.kchoi:user.kchoi.sampleC
Tree: nominal
Filter: n_jet > 5
Columns: jet_e, jet_pt
Input dataset can be defined either by RucioDID
or XRootDFiles
. You need to make sure whether the ServiceX backend you specified in ServiceXBackendName
supports Rucio and/or XRootD.
ServiceX query can be constructed with either TCut syntax or func-adl.
- Options for TCut syntax:
Filter
1 andColumns
- Option for Func-adl expression:
FuncADL
1 Filter
works only for scalar-type of TBranch.
Please find other example configurations for ATLAS opendata, xAOD, and Uproot ServiceX endpoints.
The followings are available options:
Option for General |
Description | DataType |
---|---|---|
ServiceXBackendName |
ServiceX backend name in your servicex.yaml file (name should contain either uproot or xAOD to distinguish the type of transformer) |
String |
OutputDirectory |
Path to the directory for ServiceX delivered files | String |
OutputFormat |
Output file format of ServiceX delivered data (parquet for uproot or root for xaod ) |
String |
WriteOutputDict |
Name of an ouput yaml file containing Python nested dictionary of output file paths (located in the OutputDirectory ) |
String |
IgnoreServiceXCache |
Ignore the existing ServiceX cache and force to make ServiceX requests | Boolean |
Option for Sample |
Description | DataType |
---|---|---|
Name |
sample name defined by a user | String |
RucioDID |
Rucio Dataset Id (DID) for a given sample; Can be multiple DIDs separated by comma |
String |
XRootDFiles |
XRootD files (e.g. root:// ) for a given sample; Can be multiple files separated by comma |
String |
Tree |
Name of the input ROOT TTree (uproot ONLY) |
String |
Filter |
Selection in the TCut syntax, e.g. jet_pt > 10e3 && jet_eta < 2.0 (TCut ONLY) |
String |
Columns |
List of columns (or branches) to be delivered; multiple columns separately by comma (TCut ONLY) | String |
FuncADL |
func-adl expression for a given sample (func adl ONLY) | String |
Deliver data
from servicex_databinder import DataBinder
sx_db = DataBinder('<CONFIG>.yml')
out = sx_db.deliver()
The function deliver()
returns a Python nested dictionary:
- for
uproot
backend:out['<SAMPLE>']['<TREE>'] = [ List of output files ]
- for
xAOD
backend:out['<SAMPLE>'] = [ List of output files ]
Acknowledgements
Support for this work was provided by the the U.S. Department of Energy, Office of High Energy Physics under Grant No. DE-SC0007890
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for servicex_databinder-0.1.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7042b0d502ec52e9382bb2d1f993362176ca3a6b1c8775d6eb6ab66e3776766 |
|
MD5 | 2e1b6b88b94bda4d2a83fc6f29667ae5 |
|
BLAKE2b-256 | 1f617ab98e1942cd365afe5991c87fed38c7842b2dadc3f722f13f39fdc443da |
Hashes for servicex_databinder-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfd23f1454baf3d0cddd73c82c8f43f52169dab1ada5e5af4f03618d62193738 |
|
MD5 | a7b04aefd20bacb5f62c1010ab98dd24 |
|
BLAKE2b-256 | 885186da698518b5618905fc621c251a63e75aee61d3892fb5bc4bde4d7df220 |