Skip to main content

Project with lists of LFNs and utilities needed to download filteres ntuples

Project description

[TOC]

$R_X$ data

This repository contains:

  • Versioned lists of LFNs
  • Utilities to download them and link them into a tree structure

for all the $R_X$ like analyses. For instructions on how to:

  • Produce new ntuples with friend trees
  • Downloading filtered ntuples from the grid
  • Merging data ntuples
  • Copying ntuples from cluster to laptop
  • Outdated instructions that hasn't been removed yet

Check this.

Below are the instructions on how to access data from EOS.

Installation

To install this project run:

pip install git+ssh://git@gitlab.cern.ch:7999/rx_run3/rx_data.git

The code below assumes that all the data is in ANADIR. If you want to use the data in EOS do:

export ANADIR=/eos/lhcb/wg/RD/RX_run3

preferably in ~/.bashrc.

Accessing ntuples

Once

from rx_data.rdf_getter     import RDFGetter

# This picks one sample for a given trigger
# The sample accepts wildcards, e.g. `DATA_24_MagUp_24c*` for all the periods
gtr = RDFGetter(
    sample   ='DATA_24_Mag*_24c*',
    analysis = 'rx',                    # This is the default, could be nopid
    tree     = 'DecayTree'              # This is the default, could be MCDecayTre
    trigger  ='Hlt2RD_BuToKpMuMu_MVA')

# If False (default) will return a single dataframe for the sample
rdf = gtr.get_rdf(per_file=False)

# If True, will return a dictionary with an entry per file. They key is the full path of the ROOT file
d_rdf = gtr.get_rdf(per_file=True)

The way this class will find the paths to the ntuples is by using the DATADIR environment variable. This variable will point to a path $DATADIR/samples/ with the YAML files mentioned above.

In the case of the MVA friend trees the branches added would be mva.mva_cmb and mva.mva_prc.

Thus, one can easily extend the ntuples with extra branches without remaking them.

Checking what samples exist

For this run:

check_sample_stats -p rx

which will print something like:

mva main swp_cascade brem_track_2 swp_jpsi_misid hop
Bd_JpsiX_ee_eq_JpsiInAcc 54 108 108 108 108 108
Bd_Kstee_eq_btosllball05_DPC 6 6 6 6 6 6
Bd_Kstmumu_eq_btosllball05_DPC 8 8 8 nan 8 8
Bs_JpsiX_ee_eq_JpsiInAcc 54 108 108 108 108 108
Bs_phiee_eq_Ball_DPC 5 5 5 5 5 5
Bu_JpsiK_ee_eq_DPC 14 28 28 28 28 28
Bu_JpsiK_mm_eq_DPC 37 37 37 nan 37 37
Bu_JpsiPi_ee_eq_DPC 6 6 6 6 6 6
Bu_JpsiPi_mm_eq_DPC 10 10 10 nan 10 10
Bu_JpsiX_ee_eq_JpsiInAcc 77 154 154 154 154 154
Bu_K1ee_eq_DPC 10 10 10 10 10 10
Bu_K2stee_Kpipi_eq_mK1430_DPC 11 11 11 11 11 11
Bu_Kee_eq_btosllball05_DPC 6 6 6 6 6 6
Bu_Kmumu_eq_btosllball05_DPC 5 5 5 nan 5 5
Bu_KplKplKmn_eq_sqDalitz_DPC nan 9 nan nan nan nan
Bu_KplpiplKmn_eq_sqDalitz_DPC nan 9 nan nan nan nan
Bu_Kstee_Kpi0_eq_btosllball05_DPC 10 10 10 10 10 10
Bu_piplpimnKpl_eq_sqDalitz_DPC nan 9 nan nan nan nan
Bu_psi2SK_ee_eq_DPC 6 6 6 6 6 6
DATA_24_MagDown_24c1 5 6 6 4 6 6
DATA_24_MagDown_24c2 5 6 6 4 6 6
DATA_24_MagDown_24c3 5 6 6 4 6 6
DATA_24_MagDown_24c4 5 6 6 4 6 6
DATA_24_MagUp_24c1 5 6 6 4 6 6
DATA_24_MagUp_24c2 5 6 6 4 6 6
DATA_24_MagUp_24c3 5 6 6 4 6 6
DATA_24_MagUp_24c4 5 6 6 4 6 6

Where the rows represent samples and the columns represent the friend trees. The numbers are the number of ntuples.

Multithreading

Multithreading with ROOT dataframes at the moment is dangerous and should be done only in a few places. To turn this on run:

nthreads = 3 # Or any reasonable number
with RDFGetter.multithreading(nthreads=nthreads):
    gtr = RDFGetter(sample=sample, trigger='Hlt2RD_BuToKpEE_MVA')
    rdf = gtr.get_rdf()

    process_rdf(rdf)
  • Once outside the manager, multithreading will be off.
  • One can use nthreads=1 to turn off mulithreading
  • Negative or zero threads will raise exception.

Unique identifiers

In order to get a string that fully identifies the underlying sample, i.e. a hash, do:

gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
uid = gtr.get_uid()

Excluding datasets

One can also exclude a certain type of friend trees with:

from rx_data.rdf_getter     import RDFGetter

wih RDFGetter.exclude_friends(names=['mva']):
    gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
    rdf = gtr.get_rdf(per_file=False)

that should leave the MVA branches out of the dataframe.

Defining custom columns

Given that this RDFGetter can be used across multiple modules, the safest way to add extra columns is by specifying their definitions once at the beggining of the process (i.e. the initializer function called within the main function). This is done with:

from rx_data.rdf_getter     import RDFGetter

RDFGetter.custom_columns(columns = d_def)

If custom columns are defined in more than one place in the code, the function will raise an exception, thus ensuring a unique definition for all dataframes.

Accessing metadata

Information on the ntuples can be accessed through the metadata instance of the TStringObj class, which is stored in the ROOT files. This information can be dumped in a YAML file for easy access with:

dump_metadata -f root://x509up_u12477@eoslhcb.cern.ch//eos/lhcb/grid/user/lhcb/user/a/acampove/2025_02/1044184/1044184991/data_24_magdown_turbo_24c2_Hlt2RD_BuToKpEE_MVA_4df98a7f32.root

which will produce metadata.yaml.

Run1/2 samples

For now these samples are only in the UCAS cluster and only the rare electron signal has been made available through:

from rx_data.rdf_getter12 import RDFGetter12

gtr = RDFGetter12(
    sample ='Bu_Kee_eq_btosllball05_DPC', # BuKee
    trigger='Hlt2RD_BuToKpEE_MVA',        # This will be the eTOS trigger
    dset   ='2018')                       # Can be any year in Run1/2 or all for the full sample

rdf = gtr.get_rdf()

this dataframe has had the full selection applied, except for the MVA, q2 and mass cuts.

Cuts can be added with:

from rx_data.rdf_getter12 import RDFGetter12

d_sel   = {
    'bdt' : 'mva_cmb > 0.5 & mva_prc > 0.5',
    'q2'  : 'q2_track > 14300000'}

with RDFGetter12.add_selection(d_sel = d_sel):
    gtr = RDFGetter12(
        sample =sample,
        trigger=trigger,
        dset   =dset)

    rdf = gtr.get_rdf()

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rx_data-0.2.1.dev49.tar.gz (13.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rx_data-0.2.1.dev49-py3-none-any.whl (14.4 MB view details)

Uploaded Python 3

File details

Details for the file rx_data-0.2.1.dev49.tar.gz.

File metadata

  • Download URL: rx_data-0.2.1.dev49.tar.gz
  • Upload date:
  • Size: 13.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rx_data-0.2.1.dev49.tar.gz
Algorithm Hash digest
SHA256 81f04de8e79ee082fbc0984f4ef084eed5d3a5f8e9d36480c3b76b2f445a668a
MD5 b495cb5ce8d089c5eaf1bdbcf704b4e6
BLAKE2b-256 90ea9238ada152f6fbc2898112cba6960c52dc501fabeac1f4978228efaf5894

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.2.1.dev49.tar.gz:

Publisher: publish.yaml on RX-Run3/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rx_data-0.2.1.dev49-py3-none-any.whl.

File metadata

  • Download URL: rx_data-0.2.1.dev49-py3-none-any.whl
  • Upload date:
  • Size: 14.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rx_data-0.2.1.dev49-py3-none-any.whl
Algorithm Hash digest
SHA256 e76e3c86a0b8563c21207e83d28d627e13b65acf89d8976fbf196ba30c6f3249
MD5 d75563ebaaaca5091a993adbfeee9e92
BLAKE2b-256 c15f03d08bf2f27fdb18014438c32e54dde44f736834becfb5c52272a23c88a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.2.1.dev49-py3-none-any.whl:

Publisher: publish.yaml on RX-Run3/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page