Skip to main content

No project description provided

Project description

Client for CTA Data Download Service at Swiss CTA DC

Disclaimer

This is a prototype development only for Swiss CTA DC. Please report bugs here, and for any details please inquire the CTAO CH DC management.

Purpose

CTA data are/will be stored at CTAO-CH data center at CSCS. These data should be accessible for selected external users. In addition, these files should be available within interactive analysis platform at CSCS. This client presents an API access to these services and data.

Accessing dCache with tokens

The latest version of ctadata has direct_api module which implements direct download/upload mode without using downloadservice as proxy.

Installation

Currently the direct API uses oidc-agent tool for the token maintanance and davix tools for downloading and uploading the files. These tools can be either compiled locally and added to PATH environment variable or installed inside [conda] (https://anaconda.org/anaconda/conda) or [mamba] (https://github.com/mamba-org/mamba) environment. Below is the example of the installation using conda package manager.

$ conda create -name ctadata # install conda environment if you don't have it
$ conda activate ctadata # activate the environment
$ conda install oidc-agent davix -c conda-forge # install the packages required
$ pip install 'ctadata>=0.6.0' # install ctadata library in the same environment

Token maintanance

To get access to CTA-CSCS storage you need to generate OpenID Connect token. The token is temporary and needs to be updated on regular bases. This process is implemented by the token agent service which can be started by the command

$ cta-data-direct start-agent

When the command is started for the first time a client secret must be provided as optional argument:

$ cta-data-direct start-agent -s YOUR_SECRET

The token is being stored in the user's home directory, so the above command needs to be started only on a single machine. During the agent initialization process an account is created for the token maintanance. The user is being asked to authenticate the account creation by visiting the authentication page in a browser on any device.

Basic usage

As soon as the account creation process completes the token is created and one can start using other direct API functions.

Listing directory contents

from ctadata import direct as ctadata

for path in ctadata.list_dir("cta"):
    print(path)

Downloading files and directories

To download contents of some file or dir:

from ctadata import direct as ctadata

# downloading single file
ctadata.fetch_and_save_file_or_dir("lst/some-data-dir/some-data-file") 

# recursively downloading a directory
ctadata.fetch_and_save_file_or_dir("lst/some-data-dir", recursive=True)

or, in bash:

cta-data-direct get lst/some-data-dir/some-data-file
cta-data-direct get --recursive lst/some-data-dir

Uploading files and directories

To upload a file:

from ctadata import direct as ctadata
ctadata.direct.upload_file("latest.txt", "your-folder/new-file-name.md")
ctadata.direct.upload_file("latest.txt", "your-folder/") # will autocomplete to `your-folder/latest.txt`

You can also use command line interface to do this:

$ cta-data-direct put latest-file-list latest-file-list-bla-bla

Usage from within CTA CSCS JupyterHub platform

From within CTA CSCS JupyterHub platform, selected authorized users are able to access the "data download service", like so:

import ctadata

for url in ctadata.list_dir("cta/DL1/20241114/v0.1/"):
    if 'datacheck' not in url and '.0100' in url and '11111' in url:
        print("stored", ctadata.fetch_and_save_file(url)/1024/1024, "Mb")
        print("found keys", h5py.File(url.split("/")[-1]).keys())

To download a file:

ctadata.fetch_and_save_file_or_dir("lst/some-data-dir", recursive=True)

or, in bash and recursively:

ctadata get --recursive lst/some-data-dir

Uploading files

To upload a file:

ctadata.upload_file("latest.txt", "your-folder/new-file-name.md")
ctadata.upload_file("latest.txt", "your-folder/") # will autocomplete to `your-folder/latest.txt`

The result is:

{
  "path": "lst/users/volodymyr_savchenko_epfl_ch/filelists/new-file-name",
  "status": "uploaded",
  "total_written": 60098730
}

Note that for every user, the file is uploaded to their own directory constructed from the user name. The path specified is relative to this directory. If you need to move the files to common directories, please ask for support. But you likely want to just share returned path to be used as so:

ctadata.fetch_and_save_file("lst/users/volodymyr_savchenko_epfl_ch/filelists/new-file-name")

You can also use command line interface to do this:

$ cta-data put latest-file-list latest-file-list-bla-bla

Beware that all the files written are accessible to all CTAO members and all platform users.

From outside (possibly another jupyterhub)

You need to get yourself a jupyterhub token, it will be used to authenticate to the download service.

If you are in the session, navigate to the hub control panel this way:

image

Request a token:

image

The rest is similar to the previous case:

import os
os.environ["CTADS_URL"] = "DATA-DISTRIBUTING-JUPYTERHUB/services/downloadservice/"
os.environ["CTACS_URL"] = "DATA-DISTRIBUTING-JUPYTERHUB/services/certificateservice/"
os.environ["JUPYTERHUB_API_TOKEN"] = "INSERT-YOUR-TOKEN-HERE"

for url in ctadata.list_dir("cta/DL1/20241114/v0.1"):
    if 'datacheck' not in url and '.0100' in url and '11111' in url:
        print("stored", ctadata.fetch_and_save_file(url)/1024/1024, "Mb")
        print("found keys", h5py.File(url.split("/")[-1]).keys())

Webdav Client

In order to make use of bare WebDAV interface of the storage, ctadata also provides a configured webdav4 client (see webdav4 for documentation).

client = ctadata.webdav4_client()
client.ls("/")
client.uploadFile("example.txt", "remote/example.txt")

Please see WebDAV4 documenation for details on it's wide range of features.

Delegating a proxy grid certificate to the Platform

In order to make use of your own grid certificate to access CTA-CSCS storage from within CTA interactive platform it is necessary to upload you short-term proxy certificate to the platform. cta-data provides an easy way to do this:

This tools also offers a way to upload your own time limited certificate to access the background webdav server.

import ctadata
ctadata.upload_certificate('yourcertificate.crt')

Note that if you do not upload your own certificate, you can ask to make use of a shared robot certificate used for data syncing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctadata-0.6.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctadata-0.6.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file ctadata-0.6.0.tar.gz.

File metadata

  • Download URL: ctadata-0.6.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.11.10 Linux/6.8.0-52-generic

File hashes

Hashes for ctadata-0.6.0.tar.gz
Algorithm Hash digest
SHA256 8bbbe64e42f8cf690cab21b134a9c1c336b4365ea86b4cee48c6e4cbbe367e82
MD5 5996a2dd3864e4a4f77e5d3d0556f160
BLAKE2b-256 c87ebe9b4029141619e53781c303a1dd1424432e668e37c8a81a85b6b0d59928

See more details on using hashes here.

File details

Details for the file ctadata-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: ctadata-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.11.10 Linux/6.8.0-52-generic

File hashes

Hashes for ctadata-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f37b63048fe0b2227607d69c28b1165f1735b8e772e19a1c8d85c1ee58d20a86
MD5 b4d3533140b56b3981a72de8f49496f1
BLAKE2b-256 89b4e913f403fa41ba4ad4e2d60a97afecc091b9be15f6d1666ca8f188ee6d34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page