Skip to main content

Tools for registering LSST metadata information into Rucio

Project description

rucio-register

Command and API to add Butler specific information to Rucio metadata.

This is a guide to using the rucio-register command for registering Butler files with Rucio.

Butler files are expected to be located in a Rucio directory structure, below a directory named for a Rucio scope. For example, if the root of the Rucio directory is "/rucio/disks/xrd1/rucio" and the Rucio scope is "test", the files should be located below "/rucio/disks/xrd1/rucio/test".

Example

The command "rucio-register" registers files with Rucio. This command requires a YAML configuration file which specifies the Rucio rse and scope, as well as the root of the directory where files are deposited, and the external reference to the Rucio RSE. This configuration file can be specified on the command line, or in the environment variable RUCIO_REGISTER_CONFIG.

The command can register data-products or raws:

for data products:

rucio-register data-products --log-level INFO -r /rucio/disks/xrd1/rucio/test -c HSC/runs/RC2/w_2023_32/DM-40356/20230814T170253Z -t visitSummary -d rubin_dataset -C register_config.yaml

for raws:

rucio-register raws --log-level INFO -r /rucio/disks/xrd1/rucio/test -d rubin_dataset --collections LATISS/raw/all -C register_config.yaml \*

Note that for raws, this is similar to how one uses the butler command

This command looks for files registered in the butler repo "/repo/main" using the "dataset-type" and "collections" arguments to query the butler. Note that the repo name's suffix is the Rucio "scope". In this example, that scope is "main".

The resulting datasets' files are registered with Rucio, as specified in the "config.yaml" file. Additionally, those files are registered with the Rucio dataset specified by the "rucio-dataset" argument.

for zip files:

rucio-register zips -d rubin_dataset --log-level INFO -C /home/lsst/rucio_register/examples/register_config.yaml --zip-file file:///rucio/disks/xrd1/rucio/test/something/2c8f9e54-9757-54c0-9119-4c3ac812a2da.zip

Note for zip files, register a single zip file at a time.

for dimension record YAML files:

rucio-register dimensions -d rubin_dataset --log-level INFO -C /home/lsst/rucio_register/examples/register_config.yaml --dimension-file file:///rucio/disks/xrd1/rucio/test/something/dimensions.yaml

Note for zip files, register a single zip file at a time.

config.yaml

The config.yaml file includes information which specifies the Rucio RSE to use, the Rucio scope, the local root of the RSE, and the URL prefix of the location where Rucio stores the files.

rucio_rse: "XRD1"
scope: "main"
rse_root: "/rucio/disks/xrd1/rucio"
dtn_url: "root://xrd1:1094//rucio"

export-datasets

Command and to dump Butler dataset, dimension, and calibration validity range data to a YAML file.

This command works alongside "rucio-register". It can be used to record all the files registered into Rucio so that their transfer and ingestion at the destination can be confirmed. In addition, it preserves dimension data and calibration validity range data that is not otherwise transferred via Rucio. This additional data can be useful for repeated ingests of raw and calibration data into Butler repositories.

Examples

To record the dimension values (notably not including the visit dimension, which would have to be regenerated) for a set of raw images:

export-datasets \
    --root /sdf/group/rubin/lsstdata/offline/instrument/ \
    --filename Dataset-LSSTCam-NoTract-20250101-0000.yaml \
    --collections LSSTCam/raw/all \
    --where "instrument='LSSTCam' and day_obs=20250101 and exposure.seq_num IN (1..99)" \
    --limit 30000 \
    /repo/main raw

--root is needed here since the original files are ingested as full URLs with direct.

To record the datasets created by a multi-site processing workflow:

export-datasets \
    --filename Dataset-LSSTCam-Tract2024-Step3-Group5-metadata.yaml \
    --collections step3/group5 \
    --where "tract=2024" \
    $LOCAL_REPO '*_metadata'

Note the use of a glob pattern to select dataset types of interest.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsst_rucio_register-30.0.4rc1.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lsst_rucio_register-30.0.4rc1-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file lsst_rucio_register-30.0.4rc1.tar.gz.

File metadata

  • Download URL: lsst_rucio_register-30.0.4rc1.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lsst_rucio_register-30.0.4rc1.tar.gz
Algorithm Hash digest
SHA256 51b9b83865386896c7b386da472a520e0521713072a005c09c02d760695c5d54
MD5 482cc225e6071b75a7f3797dc996319e
BLAKE2b-256 b63b7eca42580c0aa9789b6cfffbd8096dc25b706e80db870b19b3b07c874a52

See more details on using hashes here.

File details

Details for the file lsst_rucio_register-30.0.4rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for lsst_rucio_register-30.0.4rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 f38cd674a74980d3db2c9f377be2bdd22e6bc6ba8c5862d4c8b6404165a55456
MD5 418fb46c01e6b699a33be4c470ed5f75
BLAKE2b-256 70ea97b3acf02906bddfd352c6e8b78e948d9ea9dbf3492a04cdb61ef01adda0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page