Project with lists of LFNs and utilities needed to download filteres ntuples

Project description

$R_X$ data

This repository contains:

Versioned lists of LFNs
Utilities to download them and link them into a tree structure

for all the $R_X$ like analyses.

Installation

To install this project run:

pip install rx_data

# The line below will upgrade it, in case new samples are available, the list of LFNs is part of the
# project itself
pip install --upgrade rx_data

The download would require a grid proxy, which can be made with:

. /cvmfs/lhcb.cern.ch/lib/LbEnv

# This will create a 100 hours long proxy
lhcb-proxy-init -v 100:00

Listing available triggers

In order to see what triggers are present in the current version of the ntuples do:

list_triggers -v v1

# And this will save them to a yaml file
list_triggers -v v1 -o triggers.yaml

Downloading the ntuples

For this, run:

download_rx_data -m 5 -p /path/to/downloaded/.data -v v1 -d -t triggers.yaml

which will use 5 threads to download the ntuples associated to the triggers in triggers.yaml and version v1 to the specified path.

IMPORTANT:

In order to prevent deleting the data, save it in a hiden folder, e.g. one starting with a period. Above it is .data.
This path is optional, one can export DOWNLOAD_NTUPPATH and the path will be picked up

Potential problems: The download happens through XROOTD, which will try to pick a kerberos token. If authentication problems happen, do:

which kinit

and make sure that your kinit does not come from a virtual environment but it is the one in the LHCb stack or the native one.

Organizing paths

Building directory structure

All the ntuples will be downloaded in a single directory. In order to group them by sample and trigger run:

make_tree_structure -i /path/to/downloaded/.data/v1 -o /path/to/directory/structure

this will not make a copy of the ntuples, it will only create symbolic links to them.

Making YAML with files list

If instead one does:

make_tree_structure -i /path/to/downloaded/.data/v1 -f samples.yaml

the links won't be made, instead a YAML file will be created with the list of files for each sample and trigger.

Lists from files in the grid

If instead of taking the downloaded files, one wants the ones in the grid, one can do:

make_tree_structure -v v4 -f samples.yaml

where v4 is the version of the JSON files holding the LFNs. In case one needs the old naming, used in Run1 and Run2 one would run:

make_tree_structure -v v4 -f samples.yaml -n old

This will likely drop samples that have no old naming, because they were not used in the past.

Dropping triggers

The YAML outputs of the commands above will be very large and not all of it will be needed. One can drop triggers by:

# This will dump a list of triggers to triggers.yaml
# You can optionally remove not needed triggers
list_triggers -v v4 -o triggers.yaml

# This will use those triggers only to make samples.yaml
make_tree_structure -v v4 -f samples.yaml -t triggers.yaml

Sending files to user's CERNBOX

In order to share files one can:

Use the CERNBOX website to upload the files. These files will endup in EOS. One can upload entire directories.
Use make_tree_structure to dump to YAML the list of PFNs with:

make_tree_structure -i /publicfs/ucas/user/campoverde/Data/RX_run3/v5/mva/v1 -f rx_mva.yaml -p /eos/user/a/acampove/Data/mva/v1

Where -p is the directory in EOS where the files will go.

Samples naming

The samples were named after the DecFiles names for the samples and:

Replacing certain special charactes as shown here
Adding a _SS suffix for split sim samples. I.e. samples where the photon converts into an electron pair.

A useful guide showing the correspondence between event type and name is here

Accessing ntuples

Assuming that all the tnuples for data and simulation are in a given directory, the line below:

make_tree_structure -i /directory/with/ntuples -f samples.yaml

Will create a samples.yaml with the list of paths to ROOT files, per trigger and sample. If a second set of branches can be obtained, e.g. with MVA scores, one can run the same command:

make_tree_structure -i /directory/with/mva/ntuples -f mva.yaml

and in order to attach the main ntuples to the MVA ntuples:

from rx_data.rdf_getter     import RDFGetter

# This is how the YAML files with the samples information is passed 
RDFGetter.samples = {
        'main' : '/home/acampove/Packages/rx_data/samples.yaml', # for main trees
        'mva'  : '/home/acampove/Packages/rx_data/mva.yaml',  # for trees containing the MVA scores
        }

# This picks one sample for a given trigger
# The sample accepts wildcards, e.g. `DATA_24_MagUp_24c*` for all the periods
gtr = RDFGetter(sample='DATA_24_Mag*_24c*', trigger='Hlt2RD_BuToKpMuMu_MVA')
rdf = gtr.get_rdf()

In the case of the MVA friend trees the branches added would be mva.mva_cmb and mva.mva_prc.

Thus, one can easily extend the ntuples with extra branches without remaking them.

Accessing metadata

Information on the ntuples can be accessed through the metadata instance of the TStringObj class, which is stored in the ROOT files. This information can be dumped in a YAML file for easy access with:

dump_metadata -f root://x509up_u12477@eoslhcb.cern.ch//eos/lhcb/grid/user/lhcb/user/a/acampove/2025_02/1044184/1044184991/data_24_magdown_turbo_24c2_Hlt2RD_BuToKpEE_MVA_4df98a7f32.root

which will produce metadata.yaml.

Printing information on samples

Use:

check_sample_stats -p /path/to/rx_samples.yaml

to print a table to markdown with the sizes of each sample in Megabytes. e.g.:

| Sample                                      | Trigger                        |   Size |
|:--------------------------------------------|:-------------------------------|-------:|
| Bu_JpsiK_mm_eq_DPC                          | Hlt2RD_BuToKpMuMu_MVA          |  15829 |     ■■■■ 'BuToKpMuMu': Possible spelling mistake found.
| Bs_Jpsiphi_mm_eq_CPV_update2016_DPC         | Hlt2RD_BuToKpMuMu_MVA          |  11164 |     ■■■■ 'BuToKpMuMu': Possible spelling mistake found.
| Bd_JpsiKst_mm_eq_DPC                        | Hlt2RD_BuToKpMuMu_MVA          |   9945 |     ■■■■ 'BuToKpMuMu': Possible spelling mistake found.
| Bu_JpsiK_ee_eq_DPC                          | Hlt2RD_BuToKpEE_MVA_cal        |   8873 |     ■■■■■ 'BuToKpEE': Possible spelling mistake found.
| Bu_JpsiK_ee_eq_DPC                          | Hlt2RD_BuToKpEE_MVA            |   8488 |
...

Merging files

After the preselection the data files are very small and there are many of them. The following line can be used to merge them:

merge_samples -p /path/to/samples/rx_samples.yaml -s DATA_24_MagUp_24c2 -t Hlt2RD_BuToKpMuMu_MVA

where the command will merge all the files associated to a given sample and trigger and will find the paths in the file passed through -p.

Copying files

If the original files are downloaded to a cluster and the user needs the files in e.g. a laptop one could:

Use SSHFS to mount the cluster's file system in the laptop.
Copy the files through

copy_samples -k main -f to_copy.yaml -v v5 -d

where to_copy.yaml specifies what samples will be copied and where, e.g.:

inp_dir : /path/to/directory/with/sample/directories # Sample directories: main, hop, mva, swp_cascade...
out_dir : /path/to/directory/in/laptop
samples :
  signal:
    - 12123003 # Kee
    - 12113002 # Kmm
   ...

Checking for corrupted files

For this run:

check_corrupted -p /path/to/directory/with/files -x "data_*_MVA_*.root"

Which will check for corrupted files and will remove them. -x can be used to pass wildcards, in the case above, it would target only data. After removal, the download can be tried again, which would run only on the missing samples. This might allow for these files to be fixed, assuming that they were broken due to network issues.

Calculating extra branches

Given the files produced by post_ap, new branches can be attached. These branches can be calculated using branch_calculator and can be placed in small files. These latter files would be made into friends of the main files.

In order to do this we assume that all the ntuples live in $DATADIR/main/vx, where DATADIR needs to be exported such that the code will pick it up. vx represents a version of the ntuples (e.g. v1, v2, etc), the code will pick up the latest. Then run:

branch_calculator -k swp_jpsi_misid -p  0 40 -b -v v1

which will:

Create a new set of files in $DATADIR/swp_jpsi_misid/v1 with each input file, corresponding to an output file.
Split the input files into 40 groups, with roughly the same file size.
Process the zeroth group.

Thus, this can be parallelized by running the line above 40 times in 40 jobs.

Currently the command can add:

swp_jpsi_misid: Branches corresponding to lepton kaon swaps that make the resonant mode leak into rare modes. Where the swap is inverted and the $J/\psi$ mass provided

swp_cascade: Branches corresponding to $D\toK\pi$ with $\pi\to\ell$ swaps, where the swap is inverted and the $D$ mass provided.

hop: With the $\alpha$ and mass branches calculated

Project details

Release history Release notifications | RSS feed

0.2.2.dev640 pre-release

Oct 25, 2025

0.2.2.dev630 pre-release

Oct 25, 2025

0.2.2.dev594 pre-release

Oct 23, 2025

0.2.2.dev592 pre-release

Oct 22, 2025

0.2.2.dev211 pre-release

Sep 15, 2025

0.2.2.dev203 pre-release

Sep 14, 2025

0.2.2.dev198 pre-release

Sep 14, 2025

0.2.2.dev197 pre-release

Sep 14, 2025

0.2.2.dev195 pre-release

Sep 14, 2025

0.2.2.dev190 pre-release

Sep 14, 2025

0.2.2.dev187 pre-release

Sep 14, 2025

0.2.2.dev180 pre-release

Sep 14, 2025

0.2.2.dev168 pre-release

Sep 13, 2025

0.2.2.dev159 pre-release

Sep 12, 2025

0.2.2.dev158 pre-release

Sep 11, 2025

0.2.2.dev156 pre-release

Sep 10, 2025

0.2.2.dev150 pre-release

Sep 9, 2025

0.2.2.dev83 pre-release

Aug 18, 2025

0.2.2.dev66 pre-release

Nov 1, 2025

0.2.2.dev59 pre-release

Aug 2, 2025

0.2.2.dev57 pre-release

Aug 1, 2025

0.2.2.dev56 pre-release

Aug 1, 2025

0.2.2.dev53 pre-release

Aug 1, 2025

0.2.2.dev52 pre-release

Aug 1, 2025

0.2.2.dev50 pre-release

Aug 1, 2025

0.2.2.dev18 pre-release

Jul 28, 2025

0.2.1

Jul 28, 2025

0.2.1.dev154 pre-release

Jul 28, 2025

0.2.1.dev153 pre-release

Jul 28, 2025

0.2.1.dev152 pre-release

Jul 28, 2025

0.2.1.dev143 pre-release

Jul 28, 2025

0.2.1.dev140 pre-release

Jul 28, 2025

0.2.1.dev138 pre-release

Jul 28, 2025

0.2.1.dev130 pre-release

Jul 28, 2025

0.2.1.dev117 pre-release

Jul 26, 2025

0.2.1.dev74 pre-release

Jul 25, 2025

0.2.1.dev49 pre-release

Jul 25, 2025

0.2.1.dev39 pre-release

Jul 25, 2025

0.2.1.dev34 pre-release

Jul 25, 2025

0.2.1.dev33 pre-release

Jul 25, 2025

0.2.1.dev32 pre-release

Jul 25, 2025

0.2.1.dev30 pre-release

Jul 22, 2025

0.2.1.dev29 pre-release

Jul 22, 2025

0.2.1.dev26 pre-release

Jul 21, 2025

0.2.1.dev25 pre-release

Jul 21, 2025

0.2.1.dev14 pre-release

Jul 21, 2025

0.2.1.dev6 pre-release

Jul 21, 2025

0.2.1.dev5 pre-release

Jul 21, 2025

0.2.1.dev1 pre-release

Jul 21, 2025

0.2.0

Jul 21, 2025

0.1.9

Apr 13, 2025

This version

0.1.8

Mar 8, 2025

0.1.7

Feb 22, 2025

0.1.6

Feb 7, 2025

0.1.5

Feb 5, 2025

0.1.4

Jan 31, 2025

0.1.3

Jan 21, 2025

0.1.2

Jan 19, 2025

0.1.1

Jan 19, 2025

0.1.0

Jan 19, 2025

0.0.9

Jan 13, 2025

0.0.8

Jan 5, 2025

0.0.7

Jan 4, 2025

0.0.6

Jan 2, 2025

0.0.5

Dec 16, 2024

0.0.4

Dec 15, 2024

0.0.3

Dec 15, 2024

0.0.0

Oct 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rx_data-0.1.8.tar.gz (414.2 kB view details)

Uploaded Mar 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rx_data-0.1.8-py3-none-any.whl (417.3 kB view details)

Uploaded Mar 8, 2025 Python 3

File details

Details for the file rx_data-0.1.8.tar.gz.

File metadata

Download URL: rx_data-0.1.8.tar.gz
Upload date: Mar 8, 2025
Size: 414.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rx_data-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`e760b261dca2eab3ecfdcfda8458d52dd570c40ccb928f806c1a461ea11c1a23`
MD5	`ba72acabd2e46b7fc757469277629481`
BLAKE2b-256	`64b72e9f619efd0c8255e0e27e373440f29707818bdf197735027f866711173b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.1.8.tar.gz:

Publisher: publish.yaml on acampove/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rx_data-0.1.8.tar.gz
- Subject digest: e760b261dca2eab3ecfdcfda8458d52dd570c40ccb928f806c1a461ea11c1a23
- Sigstore transparency entry: 178964862
- Sigstore integration time: Mar 8, 2025
Source repository:
- Permalink: acampove/rx_data@97e190fabe7c95e85925a6d1a781c10eea81d6bb
- Branch / Tag: refs/tags/0.1.8
- Owner: https://github.com/acampove
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@97e190fabe7c95e85925a6d1a781c10eea81d6bb
- Trigger Event: push

File details

Details for the file rx_data-0.1.8-py3-none-any.whl.

File metadata

Download URL: rx_data-0.1.8-py3-none-any.whl
Upload date: Mar 8, 2025
Size: 417.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rx_data-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc42f7df4e5aa519b55b4751e6f8917ef739d07f7ea2775cd2fce2fefaac96a0`
MD5	`958f29e5c612df40cd8881a8d242eda3`
BLAKE2b-256	`fc4d7933e31425edc4cd6e056c53e46e991416a9424c3d7fc3ac4911c6f853a0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rx_data-0.1.8-py3-none-any.whl:

Publisher: publish.yaml on acampove/rx_data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rx_data-0.1.8-py3-none-any.whl
- Subject digest: cc42f7df4e5aa519b55b4751e6f8917ef739d07f7ea2775cd2fce2fefaac96a0
- Sigstore transparency entry: 178964868
- Sigstore integration time: Mar 8, 2025
Source repository:
- Permalink: acampove/rx_data@97e190fabe7c95e85925a6d1a781c10eea81d6bb
- Branch / Tag: refs/tags/0.1.8
- Owner: https://github.com/acampove
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@97e190fabe7c95e85925a6d1a781c10eea81d6bb
- Trigger Event: push

rx-data 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project description

$R_X$ data

Installation

Listing available triggers

Downloading the ntuples

Organizing paths

Building directory structure

Making YAML with files list

Lists from files in the grid

Dropping triggers

Sending files to user's CERNBOX

Samples naming

Accessing ntuples

Accessing metadata

Printing information on samples

Merging files

Copying files

Checking for corrupted files

Calculating extra branches

Project details

Verified details

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance