Python SDK and CLI Client for ServiceX
Project description
servicex_client
Python SDK and CLI Client for ServiceX
Configuration
The client relies on a YAML file to obtain the URLs of different servicex
deployments, as well as tokens to authenticate with the service. The file
should be named .servicex
and the format of this file is as follows:
api_endpoints:
- endpoint: http://localhost:5000
name: localhost
- endpoint: https://servicex-release-testing-4.servicex.ssl-hep.org
name: testing4
token: ...
default_endpoint: testing4
cache_path: /tmp/ServiceX_Client/cache-dir
shortened_downloaded_filename: true
The default_endpoint
will be used if otherwise not specified. The cache
database and downloaded files will be stored in the directory specified by
cache_path
.
The shortened_downloaded_filename
property controls whether downloaded files
will have their names shortened for convenience. Setting to false preserves
the full filename from the dataset.
`
The library will search for this file in the current working directory and then start looking in parent directories until a file is found.
Command Line Interface
When installed, the client provides a new command in your shell, servicex
.
This command uses a series of subcommands to work with various functions of
serviceX.
Common command line arguments:
Flag | Long Flag | What it does |
---|---|---|
-u | --url | The url of the serviceX ingress |
-b | --backend | Named backend from the .servicex file endpoints list |
If neither url nor backend are specified then the client will attempt to use the
default_endpoint
value to determine who to talk to.
codegens
This command will list the code generators deployed.
transforms
These commands interact with transforms that have been run
list
List transforms associated with the current user. Add the --complete
flag to
only show transforms that have completed.
files
List the files along with their size generated by a transform. Specify the
transform request id with the -t
or --transform-id
flag
download
Download the files from a transform to a local directory. Specify the transform
request id with -t
and the directory to download to with -d
. Defaults to
downloading files to the current working directory.
cache
These commands allow you to work with the query cache maintained by the serviceX client.
list
Show all of the cached transforms along with the run time, code generator, and number of resulting files
delete
Delete a specific transform from the cache. Provide the transform request ID
with the -t
or --transform-id
arg.
clear
Clear all of the transforms from the cache. Add -y
to force the operation
without confirming with the console.
Python SDK
Entry to the SDK starts with constructing an instance of ServiceXClient. The
constructor accepts backend
argument to specify a named backend from the
.servicex
file, or url
for the direct URL to a serviceX server. With the
URL option you can't provide a token from .servicex
so it must either be an
unsecured endpoint, or the token must be provided via the WLCG standard of a
file pointed to by BEARER_TOKEN_FILE
environment variable.
With an instance of ServiceXClient you can
- List the code generators deployed with the ServiceX instance
- List the transformers that have been run
- Get the current status of a specific transform
Create a Dataset Instance to Run Transforms
The ServiceX client also can create a Dataset
instance that
allows you to specify a query, provide a dataset identifier,
and retrieve the results of the resulting transform request.
There are two types of datasets
- func_adl_dataset
- Python Function dataset
Dataset Identifiers
Before we get too deeply into the dataset classes, we should look at how to specify a dataset.
- RucioDatasetIdentifier - for retrieving data files registered with Rucio
- FileListDataset - A list of URIs for accessing files using xRootd
FuncADL Dataset
This dataset is controlled by the func_adl language. The dataset
supports the Select
, SelectMany
, Where
, MetaData
, and QMetaData
operators from func_adl.
Datasets
This is the abstract class for requesting data from ServiceX. You have to
specify the dataset identifier you want data from and provide some sort of
selection query. You can set the result format with the set_result_format
operator (it's also a factory method arg for the dataset).
Operators that cause the client to interact with the server: These terminal operators will call out to the serviceX server and process results. They are all implemented as asynchronous coroutines, but they also come with synchronous versions to make it easy to do easy things.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for servicex-3.0.0a18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b84f60b90e943efb263b5f9ca02bbe7c7eebc9e755c92d0d86b34fd149d516ae |
|
MD5 | fba4f57cba899d2cd5325a8d96f3e5ed |
|
BLAKE2b-256 | cf9998588c9e117b56d2ee88bcb2ae2f3bbfa270636efee982fbe01ef18cb3fc |