Project with lists of LFNs and utilities needed to download filteres ntuples
Project description
[TOC]
$R_X$ data
This repository contains:
- Versioned lists of LFNs
- Utilities to download them and link them into a tree structure
for all the $R_X$ like analyses. For instructions on how to:
- Produce new ntuples with friend trees
- Downloading filtered ntuples from the grid
- Merging data ntuples
- Copying ntuples from cluster to laptop
- Outdated instructions that hasn't been removed yet
Check this.
Below are the instructions on how to access data from EOS.
Installation
To install this project run:
pip install git+ssh://git@gitlab.cern.ch:7999/rx_run3/rx_data.git
The code below assumes that all the data is in ANADIR. If you want to use the data
in EOS do:
export ANADIR=/eos/lhcb/wg/RD/RX_run3
preferably in ~/.bashrc.
Accessing ntuples
Once
from rx_data.rdf_getter import RDFGetter
# This picks one sample for a given trigger
# The sample accepts wildcards, e.g. `DATA_24_MagUp_24c*` for all the periods
gtr = RDFGetter(
sample ='DATA_24_Mag*_24c*',
analysis = 'rx', # This is the default, could be nopid
tree = 'DecayTree' # This is the default, could be MCDecayTre
trigger ='Hlt2RD_BuToKpMuMu_MVA')
# If False (default) will return a single dataframe for the sample
rdf = gtr.get_rdf(per_file=False)
# If True, will return a dictionary with an entry per file. They key is the full path of the ROOT file
d_rdf = gtr.get_rdf(per_file=True)
The way this class will find the paths to the ntuples is by using the DATADIR environment
variable. This variable will point to a path $DATADIR/samples/ with the YAML files
mentioned above.
In the case of the MVA friend trees the branches added would be mva.mva_cmb and mva.mva_prc.
Thus, one can easily extend the ntuples with extra branches without remaking them.
Checking what samples exist
For this run:
check_sample_stats -p rx
which will print something like:
| mva | main | swp_cascade | brem_track_2 | swp_jpsi_misid | hop | |
|---|---|---|---|---|---|---|
| Bd_JpsiX_ee_eq_JpsiInAcc | 54 | 108 | 108 | 108 | 108 | 108 |
| Bd_Kstee_eq_btosllball05_DPC | 6 | 6 | 6 | 6 | 6 | 6 |
| Bd_Kstmumu_eq_btosllball05_DPC | 8 | 8 | 8 | nan | 8 | 8 |
| Bs_JpsiX_ee_eq_JpsiInAcc | 54 | 108 | 108 | 108 | 108 | 108 |
| Bs_phiee_eq_Ball_DPC | 5 | 5 | 5 | 5 | 5 | 5 |
| Bu_JpsiK_ee_eq_DPC | 14 | 28 | 28 | 28 | 28 | 28 |
| Bu_JpsiK_mm_eq_DPC | 37 | 37 | 37 | nan | 37 | 37 |
| Bu_JpsiPi_ee_eq_DPC | 6 | 6 | 6 | 6 | 6 | 6 |
| Bu_JpsiPi_mm_eq_DPC | 10 | 10 | 10 | nan | 10 | 10 |
| Bu_JpsiX_ee_eq_JpsiInAcc | 77 | 154 | 154 | 154 | 154 | 154 |
| Bu_K1ee_eq_DPC | 10 | 10 | 10 | 10 | 10 | 10 |
| Bu_K2stee_Kpipi_eq_mK1430_DPC | 11 | 11 | 11 | 11 | 11 | 11 |
| Bu_Kee_eq_btosllball05_DPC | 6 | 6 | 6 | 6 | 6 | 6 |
| Bu_Kmumu_eq_btosllball05_DPC | 5 | 5 | 5 | nan | 5 | 5 |
| Bu_KplKplKmn_eq_sqDalitz_DPC | nan | 9 | nan | nan | nan | nan |
| Bu_KplpiplKmn_eq_sqDalitz_DPC | nan | 9 | nan | nan | nan | nan |
| Bu_Kstee_Kpi0_eq_btosllball05_DPC | 10 | 10 | 10 | 10 | 10 | 10 |
| Bu_piplpimnKpl_eq_sqDalitz_DPC | nan | 9 | nan | nan | nan | nan |
| Bu_psi2SK_ee_eq_DPC | 6 | 6 | 6 | 6 | 6 | 6 |
| DATA_24_MagDown_24c1 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagDown_24c2 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagDown_24c3 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagDown_24c4 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagUp_24c1 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagUp_24c2 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagUp_24c3 | 5 | 6 | 6 | 4 | 6 | 6 |
| DATA_24_MagUp_24c4 | 5 | 6 | 6 | 4 | 6 | 6 |
Where the rows represent samples and the columns represent the friend trees. The numbers are the number of ntuples.
Multithreading
Multithreading with ROOT dataframes at the moment is dangerous and should be done only in a few places. To turn this on run:
nthreads = 3 # Or any reasonable number
with RDFGetter.multithreading(nthreads=nthreads):
gtr = RDFGetter(sample=sample, trigger='Hlt2RD_BuToKpEE_MVA')
rdf = gtr.get_rdf()
process_rdf(rdf)
- Once outside the manager, multithreading will be off.
- One can use
nthreads=1to turn off mulithreading - Negative or zero threads will raise exception.
Unique identifiers
In order to get a string that fully identifies the underlying sample, i.e. a hash, do:
gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
uid = gtr.get_uid()
Excluding datasets
One can also exclude a certain type of friend trees with:
from rx_data.rdf_getter import RDFGetter
wih RDFGetter.exclude_friends(names=['mva']):
gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
rdf = gtr.get_rdf(per_file=False)
that should leave the MVA branches out of the dataframe.
Defining custom columns
Given that this RDFGetter can be used across multiple modules, the safest way to
add extra columns is by specifying their definitions once at the beggining of the
process (i.e. the initializer function called within the main function).
This is done with:
from rx_data.rdf_getter import RDFGetter
RDFGetter.custom_columns(columns = d_def)
If custom columns are defined in more than one place in the code, the function will raise an exception, thus ensuring a unique definition for all dataframes.
Accessing metadata
Information on the ntuples can be accessed through the metadata instance of the TStringObj class, which is
stored in the ROOT files. This information can be dumped in a YAML file for easy access with:
dump_metadata -f root://x509up_u12477@eoslhcb.cern.ch//eos/lhcb/grid/user/lhcb/user/a/acampove/2025_02/1044184/1044184991/data_24_magdown_turbo_24c2_Hlt2RD_BuToKpEE_MVA_4df98a7f32.root
which will produce metadata.yaml.
Run1/2 samples
For now these samples are only in the UCAS cluster and only the rare electron signal has been made available through:
from rx_data.rdf_getter12 import RDFGetter12
gtr = RDFGetter12(
sample ='Bu_Kee_eq_btosllball05_DPC', # BuKee
trigger='Hlt2RD_BuToKpEE_MVA', # This will be the eTOS trigger
dset ='2018') # Can be any year in Run1/2 or all for the full sample
rdf = gtr.get_rdf()
this dataframe has had the full selection applied, except for the
MVA, q2 and mass cuts.
Cuts can be added with:
from rx_data.rdf_getter12 import RDFGetter12
d_sel = {
'bdt' : 'mva_cmb > 0.5 & mva_prc > 0.5',
'q2' : 'q2_track > 14300000'}
with RDFGetter12.add_selection(d_sel = d_sel):
gtr = RDFGetter12(
sample =sample,
trigger=trigger,
dset =dset)
rdf = gtr.get_rdf()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rx_data-0.2.2.dev56.tar.gz.
File metadata
- Download URL: rx_data-0.2.2.dev56.tar.gz
- Upload date:
- Size: 13.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d549487f4edb481d97bad48dcb1d643a9e96d20bc4a3c76867a24190aa5e4ff4
|
|
| MD5 |
bb704930f7f195540000e72f108e1d51
|
|
| BLAKE2b-256 |
26b5bd3ee5eda6891a25ac781f312e5e07bf78507978520f93d40b36ec27fc58
|
Provenance
The following attestation bundles were made for rx_data-0.2.2.dev56.tar.gz:
Publisher:
publish.yaml on RX-Run3/rx_data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rx_data-0.2.2.dev56.tar.gz -
Subject digest:
d549487f4edb481d97bad48dcb1d643a9e96d20bc4a3c76867a24190aa5e4ff4 - Sigstore transparency entry: 340887479
- Sigstore integration time:
-
Permalink:
RX-Run3/rx_data@578efe802a19b151ec4af7cc52ca87776a345357 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/RX-Run3
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@578efe802a19b151ec4af7cc52ca87776a345357 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rx_data-0.2.2.dev56-py3-none-any.whl.
File metadata
- Download URL: rx_data-0.2.2.dev56-py3-none-any.whl
- Upload date:
- Size: 14.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea29bf1408f47ac94b614dfa41f67d51e3d8244a8bd88925a1d9fc5543a4460c
|
|
| MD5 |
87a60686fcaec8eb927674a40b775b37
|
|
| BLAKE2b-256 |
23f8cec2b0cc30eecef142ccaef6b4eeb4a5747c0be61991c641d34d461bf02c
|
Provenance
The following attestation bundles were made for rx_data-0.2.2.dev56-py3-none-any.whl:
Publisher:
publish.yaml on RX-Run3/rx_data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rx_data-0.2.2.dev56-py3-none-any.whl -
Subject digest:
ea29bf1408f47ac94b614dfa41f67d51e3d8244a8bd88925a1d9fc5543a4460c - Sigstore transparency entry: 340887498
- Sigstore integration time:
-
Permalink:
RX-Run3/rx_data@578efe802a19b151ec4af7cc52ca87776a345357 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/RX-Run3
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@578efe802a19b151ec4af7cc52ca87776a345357 -
Trigger Event:
push
-
Statement type: