Project with lists of LFNs and utilities needed to download filteres ntuples

These details have not been verified by PyPI

Project description

[TOC]

$R_X$ data

This repository contains:

Versioned lists of LFNs
Utilities to download them and link them into a tree structure

for all the $R_X$ like analyses. For instructions on how to:

Produce new ntuples with friend trees
Downloading filtered ntuples from the grid
Merging data ntuples
Copying ntuples from cluster to laptop
Outdated instructions that hasn't been removed yet

Check this.

Below are the instructions on how to access data from EOS.

Installation

To install this project run:

pip install git+ssh://git@gitlab.cern.ch:7999/rx_run3/rx_data.git

The code below assumes that all the data is in ANADIR. If you want to use the data in EOS do:

export ANADIR=/eos/lhcb/wg/RD/RX_run3

preferably in ~/.bashrc.

How the the code makes the ROOT dataframes

When creating datframes, the code will:

Check the directories where the ROOT files are
Make lists of paths
Create dictionaries with these paths, split into samples and save them in yaml files. Each yaml file is associated to a different friend tree or the main tree.
For a given sample, pick up the lists of paths from the yaml files and create a JSON file
Use the JSON file to make the ROOT dataframe by using from_spec RDataFrame's method

Accessing ntuples

Once

from rx_data.rdf_getter     import RDFGetter

# This picks one sample for a given trigger
# The sample accepts wildcards, e.g. `DATA_24_MagUp_24c*` for all the periods
gtr = RDFGetter(
    sample   ='DATA_24_Mag*_24c*',
    analysis = 'rx',                    # This is the default, could be nopid
    tree     = 'DecayTree'              # This is the default, could be MCDecayTre
    trigger  ='Hlt2RD_BuToKpMuMu_MVA')

# If False (default) will return a single dataframe for the sample
rdf = gtr.get_rdf(per_file=False)

# If True, will return a dictionary with an entry per file. They key is the full path of the ROOT file
d_rdf = gtr.get_rdf(per_file=True)

The way this class will find the paths to the ntuples is by using the DATADIR environment variable. This variable will point to a path $DATADIR/samples/ with the YAML files mentioned above.

In the case of the MVA friend trees the branches added would be mva.mva_cmb and mva.mva_prc.

Thus, one can easily extend the ntuples with extra branches without remaking them.

Checking what samples exist as filtered ntuples in the grid

This is useful to avoid filtering the same samples multiple times, which would

Slow down the analysis due to the large ammount of data needed to download
Occupy more space in the user's grid

For this run:

from rx_data.filtered_stats import FilteredStats

fst = FilteredStats(analysis='rx', versions=[7, 10])
fst.exists(event_type='12153001', block='w31_34', polarity='magup')

This will check if a specific sample exist in the versions 7 or 10 of the filtering. Where these versions are the versions of the directories in rx_data_lfns/rx.

This will require access to the user's ganga sandbox through the GANGADIR variable. This should be improved eventually, ideally by integrating the filtering with the analysis productions pipeline.

Checking what samples exist as ntuples in ANADIR (locally)

For this run:

check_local_stats -p rx

which will print something like:

	mva	main	swp_cascade	brem_track_2	swp_jpsi_misid	hop
Bd_JpsiX_ee_eq_JpsiInAcc	54	108	108	108	108	108
Bd_Kstee_eq_btosllball05_DPC	6	6	6	6	6	6
Bd_Kstmumu_eq_btosllball05_DPC	8	8	8	nan	8	8
Bs_JpsiX_ee_eq_JpsiInAcc	54	108	108	108	108	108
Bs_phiee_eq_Ball_DPC	5	5	5	5	5	5
Bu_JpsiK_ee_eq_DPC	14	28	28	28	28	28
Bu_JpsiK_mm_eq_DPC	37	37	37	nan	37	37
Bu_JpsiPi_ee_eq_DPC	6	6	6	6	6	6
Bu_JpsiPi_mm_eq_DPC	10	10	10	nan	10	10
Bu_JpsiX_ee_eq_JpsiInAcc	77	154	154	154	154	154
Bu_K1ee_eq_DPC	10	10	10	10	10	10
Bu_K2stee_Kpipi_eq_mK1430_DPC	11	11	11	11	11	11
Bu_Kee_eq_btosllball05_DPC	6	6	6	6	6	6
Bu_Kmumu_eq_btosllball05_DPC	5	5	5	nan	5	5
Bu_KplKplKmn_eq_sqDalitz_DPC	nan	9	nan	nan	nan	nan
Bu_KplpiplKmn_eq_sqDalitz_DPC	nan	9	nan	nan	nan	nan
Bu_Kstee_Kpi0_eq_btosllball05_DPC	10	10	10	10	10	10
Bu_piplpimnKpl_eq_sqDalitz_DPC	nan	9	nan	nan	nan	nan
Bu_psi2SK_ee_eq_DPC	6	6	6	6	6	6
DATA_24_MagDown_24c1	5	6	6	4	6	6
DATA_24_MagDown_24c2	5	6	6	4	6	6
DATA_24_MagDown_24c3	5	6	6	4	6	6
DATA_24_MagDown_24c4	5	6	6	4	6	6
DATA_24_MagUp_24c1	5	6	6	4	6	6
DATA_24_MagUp_24c2	5	6	6	4	6	6
DATA_24_MagUp_24c3	5	6	6	4	6	6
DATA_24_MagUp_24c4	5	6	6	4	6	6

Where the rows represent samples and the columns represent the friend trees. The numbers are the number of ntuples.

Multithreading

Multithreading with ROOT dataframes at the moment is dangerous and should be done only in a few places. To turn this on run:

nthreads = 3 # Or any reasonable number
with RDFGetter.multithreading(nthreads=nthreads):
    gtr = RDFGetter(sample=sample, trigger='Hlt2RD_BuToKpEE_MVA')
    rdf = gtr.get_rdf()

    process_rdf(rdf)

Once outside the manager, multithreading will be off.
One can use nthreads=1 to turn off mulithreading
Negative or zero threads will raise exception.

Unique identifiers

In order to get a string that fully identifies the underlying sample, i.e. a hash, do:

gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
uid = gtr.get_uid()

Identifiers for cluster jobs

When sending jobs to a computing cluster, each job will try to read the data. Thus, it will create the JSON and YAML files mentioned above. If two jobs run in the same machine, this could create clashes and failed jobs. To avoid this do:

from rx_data.rdf_getter    import RDFGetter

sample = 'Bu_JpsiK_ee_eq_DPC'
with RDFGetter.identifier(value='job_001'):
    gtr = RDFGetter(sample=sample, trigger='Hlt2RD_BuToKpEE_MVA')
    rdf = gtr.get_rdf(per_file=False)

i.e. wrap the code in the identifier manager, which will name the files based on the job.

Excluding datasets

One can also exclude a certain type of friend trees with:

from rx_data.rdf_getter     import RDFGetter

wih RDFGetter.exclude_friends(names=['mva']):
    gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
    rdf = gtr.get_rdf(per_file=False)

that should leave the MVA branches out of the dataframe.

Defining custom columns

Given that this RDFGetter can be used across multiple modules, the safest way to add extra columns is by specifying their definitions once at the beggining of the process (i.e. the initializer function called within the main function). This is done with:

from rx_data.rdf_getter     import RDFGetter

RDFGetter.custom_columns(columns = d_def)

If custom columns are defined in more than one place in the code, the function will raise an exception, thus ensuring a unique definition for all dataframes.

Accessing metadata

Information on the ntuples can be accessed through the metadata instance of the TStringObj class, which is stored in the ROOT files. This information can be dumped in a YAML file for easy access with:

dump_metadata -f root://x509up_u12477@eoslhcb.cern.ch//eos/lhcb/grid/user/lhcb/user/a/acampove/2025_02/1044184/1044184991/data_24_magdown_turbo_24c2_Hlt2RD_BuToKpEE_MVA_4df98a7f32.root

which will produce metadata.yaml.

Run1/2 samples

For now these samples are only in the UCAS cluster and only the rare electron signal has been made available through:

from rx_data.rdf_getter12 import RDFGetter12

gtr = RDFGetter12(
    sample ='Bu_Kee_eq_btosllball05_DPC', # BuKee
    trigger='Hlt2RD_BuToKpEE_MVA',        # This will be the eTOS trigger
    dset   ='2018')                       # Can be any year in Run1/2 or all for the full sample

rdf = gtr.get_rdf()

this dataframe has had the full selection applied, except for the MVA, q2 and mass cuts.

Cuts can be added with:

from rx_data.rdf_getter12 import RDFGetter12

d_sel   = {
    'bdt' : 'mva_cmb > 0.5 & mva_prc > 0.5',
    'q2'  : 'q2_track > 14300000'}

with RDFGetter12.add_selection(d_sel = d_sel):
    gtr = RDFGetter12(
        sample =sample,
        trigger=trigger,
        dset   =dset)

    rdf = gtr.get_rdf()

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2.dev640 pre-release

Oct 25, 2025

0.2.2.dev630 pre-release

Oct 25, 2025

0.2.2.dev594 pre-release

Oct 23, 2025

0.2.2.dev592 pre-release

Oct 22, 2025

This version

0.2.2.dev211 pre-release

Sep 15, 2025

0.2.2.dev203 pre-release

Sep 14, 2025

0.2.2.dev198 pre-release

Sep 14, 2025

0.2.2.dev197 pre-release

Sep 14, 2025

0.2.2.dev195 pre-release

Sep 14, 2025

0.2.2.dev190 pre-release

Sep 14, 2025

0.2.2.dev187 pre-release

Sep 14, 2025

0.2.2.dev180 pre-release

Sep 14, 2025

0.2.2.dev168 pre-release

Sep 13, 2025

0.2.2.dev159 pre-release

Sep 12, 2025

0.2.2.dev158 pre-release

Sep 11, 2025

0.2.2.dev156 pre-release

Sep 10, 2025

0.2.2.dev150 pre-release

Sep 9, 2025

0.2.2.dev83 pre-release

Aug 18, 2025

0.2.2.dev66 pre-release

Nov 1, 2025

0.2.2.dev59 pre-release

Aug 2, 2025

0.2.2.dev57 pre-release

Aug 1, 2025

0.2.2.dev56 pre-release

Aug 1, 2025

0.2.2.dev53 pre-release

Aug 1, 2025

0.2.2.dev52 pre-release

Aug 1, 2025

0.2.2.dev50 pre-release

Aug 1, 2025

0.2.2.dev18 pre-release

Jul 28, 2025

0.2.1

Jul 28, 2025

0.2.1.dev154 pre-release

Jul 28, 2025

0.2.1.dev153 pre-release

Jul 28, 2025

0.2.1.dev152 pre-release

Jul 28, 2025

0.2.1.dev143 pre-release

Jul 28, 2025

0.2.1.dev140 pre-release

Jul 28, 2025

0.2.1.dev138 pre-release

Jul 28, 2025

0.2.1.dev130 pre-release

Jul 28, 2025

0.2.1.dev117 pre-release

Jul 26, 2025

0.2.1.dev74 pre-release

Jul 25, 2025

0.2.1.dev49 pre-release

Jul 25, 2025

0.2.1.dev39 pre-release

Jul 25, 2025

0.2.1.dev34 pre-release

Jul 25, 2025

0.2.1.dev33 pre-release

Jul 25, 2025

0.2.1.dev32 pre-release

Jul 25, 2025

0.2.1.dev30 pre-release

Jul 22, 2025

0.2.1.dev29 pre-release

Jul 22, 2025

0.2.1.dev26 pre-release

Jul 21, 2025

0.2.1.dev25 pre-release

Jul 21, 2025

0.2.1.dev14 pre-release

Jul 21, 2025

0.2.1.dev6 pre-release

Jul 21, 2025

0.2.1.dev5 pre-release

Jul 21, 2025

0.2.1.dev1 pre-release

Jul 21, 2025

0.2.0

Jul 21, 2025

0.1.9

Apr 13, 2025

0.1.8

Mar 8, 2025

0.1.7

Feb 22, 2025

0.1.6

Feb 7, 2025

0.1.5

Feb 5, 2025

0.1.4

Jan 31, 2025

0.1.3

Jan 21, 2025

0.1.2

Jan 19, 2025

0.1.1

Jan 19, 2025

0.1.0

Jan 19, 2025

0.0.9

Jan 13, 2025

0.0.8

Jan 5, 2025

0.0.7

Jan 4, 2025

0.0.6

Jan 2, 2025

0.0.5

Dec 16, 2024

0.0.4

Dec 15, 2024

0.0.3

Dec 15, 2024

0.0.0

Oct 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rx_data-0.2.2.dev211.tar.gz (13.5 MB view details)

Uploaded Sep 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rx_data-0.2.2.dev211-py3-none-any.whl (14.7 MB view details)

Uploaded Sep 15, 2025 Python 3

File details

Details for the file rx_data-0.2.2.dev211.tar.gz.

File metadata

Download URL: rx_data-0.2.2.dev211.tar.gz
Upload date: Sep 15, 2025
Size: 13.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rx_data-0.2.2.dev211.tar.gz
Algorithm	Hash digest
SHA256	`54946c90d8fb7c182be37102511768ecba403ea04e95202983beec8493e3e172`
MD5	`4fc27e6f1760143bb226705a6bd838bc`
BLAKE2b-256	`933b7f0ccf24053fe2d68220840ec994ac45cd8312bc9ab92474af6774cd556a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.2.2.dev211.tar.gz:

Publisher: publish.yaml on RX-Run3/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rx_data-0.2.2.dev211.tar.gz
- Subject digest: 54946c90d8fb7c182be37102511768ecba403ea04e95202983beec8493e3e172
- Sigstore transparency entry: 518397857
- Sigstore integration time: Sep 15, 2025
Source repository:
- Permalink: RX-Run3/rx_data@3b739e74b0e68d3b0c955f339a67d4ef56b492ac
- Branch / Tag: refs/heads/master
- Owner: https://github.com/RX-Run3
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@3b739e74b0e68d3b0c955f339a67d4ef56b492ac
- Trigger Event: push

File details

Details for the file rx_data-0.2.2.dev211-py3-none-any.whl.

File metadata

Download URL: rx_data-0.2.2.dev211-py3-none-any.whl
Upload date: Sep 15, 2025
Size: 14.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rx_data-0.2.2.dev211-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3107b1ed741375e36a830635960d01307172b78f6932ab9d22555b404e7513d6`
MD5	`e9bba961013d4d9df7cf950a7fbcb6a5`
BLAKE2b-256	`efb0318fd301069ce1ed894b27cf23add97e2b04ba68214e187587e5b97131f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.2.2.dev211-py3-none-any.whl:

Publisher: publish.yaml on RX-Run3/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rx_data-0.2.2.dev211-py3-none-any.whl
- Subject digest: 3107b1ed741375e36a830635960d01307172b78f6932ab9d22555b404e7513d6
- Sigstore transparency entry: 518397873
- Sigstore integration time: Sep 15, 2025
Source repository:
- Permalink: RX-Run3/rx_data@3b739e74b0e68d3b0c955f339a67d4ef56b492ac
- Branch / Tag: refs/heads/master
- Owner: https://github.com/RX-Run3
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@3b739e74b0e68d3b0c955f339a67d4ef56b492ac
- Trigger Event: push

rx-data 0.2.2.dev211

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

$R_X$ data

Installation

How the the code makes the ROOT dataframes

Accessing ntuples

Checking what samples exist as filtered ntuples in the grid

Checking what samples exist as ntuples in ANADIR (locally)

Multithreading

Unique identifiers

Identifiers for cluster jobs

Excluding datasets

Defining custom columns

Accessing metadata

Run1/2 samples

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance