A Snakemake Storage Plugin for Pelican Federations
Project description
Snakemake Pelican Storage Plugin
A Snakemake storage plugin for accessing data via the Pelican Platform, enabling integration with data federations like the Open Science Data Federation (OSDF).
Installation
pip install snakemake-storage-plugin-pelican
Or install from source:
git clone https://github.com/PelicanPlatform/snakemake-storage-plugin-pelican.git
cd snakemake-storage-plugin-pelican
pip install -e .
Usage
There are a few ways to use/configure the plugin. Broadly, they are:
- Wrapping
pelican://andosdf://URLs with thestorage()function - Defining a default storage provider
- Multiple, tagged Pelican plugin instances
Using the storage() function
In your Snakefile, any files that should be fetched or written to a Pelican federation can be wrapped with the storage() function.
This tells Snakemake it should automatically determine which storage plugins can be used to complete the required file operations.
In this case, the Pelican storage plugin will be used whenever the storage string begins with a pelican:// or osdf:// URL.
For example:
rule download_data:
input:
storage("pelican://osg-htc.org/pelicanplatform/test/hello-world.txt")
output:
# Anything not wrapped with `storage()` is assumed to be a local file
"local_output.txt"
shell:
"cp {input} {output}"
Defining a Default Storage Provider
If your workflow heavily relies on objects accessed with Pelican, you may not want to wrap everything explicitly with storage().
In this case, you can define the Pelican plugin as the default storage provider.
This tells Snakemake to assume all files that aren't explicitly wrapped with a different storage provider are Pelican objects.
Enable this feature by passing --default-storage-provider pelican to your Snakemake invocation.
Passing this argument also requires defining a default storage prefix to tell Snakemake what common prefix it should prepend to all relative file paths.
This is done with the --default-storage-prefix flag.
If all of your files/objects come from the same federation, this prefix will often be the federation URL with a namespace, e.g. pelican://osg-htc.org/chtc/.
For example, with this Snakefile:
rule download_data:
input:
remote_input="/staging/jhiemstra/input.txt", # Treated as Pelican (default provider)
local_input=local("some-local-file.txt") # Explicitly local with local() wrapper
output:
"/staging/jhiemstra/output.txt" # Also treated as Pelican (default provider)
shell:
"cat {input.remote_input} {input.local_input} > {output}"
Run with default storage provider:
snakemake --cores 1 \
--default-storage-provider pelican \
--default-storage-prefix "pelican://osg-htc.org/ospool/" \
--storage-pelican-token-file /path/to/token.txt
Snakemake will automatically treat "/staging/jhiemstra/input.txt" as "pelican://osg-htc.org/chtc/staging/jhiemstra/input.txt".
Tagged Plugin Instances
Advanced users can define multiple instances of the Pelican plugin with different configurations using tags. This is useful when working with multiple federations that require completely separate authentication or settings.
See the Snakemake storage plugin documentation for details on tagged plugin instances.
URL Formats
The plugin supports two URL schemes:
-
Pelican URLs:
pelican://federation-hostname/namespace/path/to/objectpelican://osg-htc.org/ospool/uchicago/public/data.txt
-
OSDF URLs:
osdf:///namespace/path/to/file(automatically converted topelican://osg-htc.org/...)osdf:///pelicanplatform/test/hello-world.txt
Authentication with Tokens
The Pelican storage plugin for Snakemake does not yet support automatic token management like other Pelican clients, but these are features that will be added in the future. In the meantime, whenever an operation on an object requires authorization, you must provide your own tokens.
For information about how to create namespace tokens, see the upstream Pelican documentation.
Single Token for All Requests
If all the Pelican objects in your Snakemake workflow that require authorization can be handled with a single token, that token can be provided to the plugin using the --storage-pelican-token-file flag, e.g.:
snakemake --cores 1 \
--storage-pelican-token-file /path/to/token.txt
Multiple Tokens with URL Prefix Mapping
For workflows accessing multiple federations or namespaces with different credentials, tokens can be mapped to specific federation prefixes.
This is done by passing the --storage-pelican-token-file flag a space-delimited string list of <URL prefix>:<token file> components.
For example, if your objects come from two different federations, you can specify which tokens should be used with which federations/prefixes:
snakemake --cores 1 \
--storage-pelican-token-file "pelican://osg-htc.org/ospool:/path/to/ospool-token.txt pelican://itb-osdf-director.osdf-dev.chtc.io/chtc/itb:/path/to/itb-token.txt"
Note: There is a quotation requirement around the string list you pass with this flag. The entire list should be wrapped in either single or double quotes.
The plugin uses longest-prefix matching to select the appropriate token for each URL. You can use this feature to map tokens to specific federations or to specific namespaces within a federation.
With Default Fallback
snakemake --cores 1 \
--storage-pelican-token-file 'pelican://osg-htc.org/chtc/itb:/path/to/itb-token.txt default:/path/to/default-token.txt'
URLs that don't match any prefix will use the default token.
Debug Logging
Enable detailed logging of the Pelican storage plugin and the underlying PelicanFS library for troubleshooting:
snakemake --cores 1 \
--storage-pelican-debug true
Examples
Example 1: Reading from OSDF
rule process_osdf_data:
input:
storage("osdf:///pelicanplatform/test/hello-world.txt")
output:
"processed_output.txt"
shell:
"cat {input} | tr '[:lower:]' '[:upper:]' > {output}"
Run without authentication (for public data):
snakemake --cores 1
Example 2: Reading from Multiple Federations
Note: This example is for demonstration purposes -- the referenced objects do not actually exist, so this Snakefile will not work if copy-pasted.
rule combine_data:
input:
osdf_data=storage("pelican://osg-htc.org/ospool/data.txt"),
itb_data=storage("pelican://itb-osdf-director.osdf-dev.chtc.io/chtc/itb/data.txt")
output:
"combined.txt"
shell:
"cat {input.osdf_data} {input.itb_data} > {output}"
Run with multiple tokens:
snakemake --cores 1 \
--storage-pelican-token-file 'pelican://osg-htc.org/ospool:/path/to/ospool-token.txt pelican://itb-osdf-director.osdf-dev.chtc.io/chtc/itb:/path/to/itb-token.txt'
Example 3: Writing to Pelican
In this "write" example, the output is written back to the storage exposed by an Origin.
Here, Snakemake creates an intermediary copy of output on the local disk and then uses PelicanFS to copy the data to the remote storage.
rule upload_results:
input:
"local_results.txt"
output:
storage("pelican://my-federation.org/myspace/results.txt")
shell:
"cp {input} {output}"
Run with write-enabled token:
snakemake --cores 1 \
--storage-pelican-token-file /path/to/write-token.txt
Token File Format
Token files should contain a single line with the bearer token:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
Configuration Options
| Option | Description | Example |
|---|---|---|
--storage-pelican-token-file |
Path(s) to token file(s), optionally with URL prefix mappings | --storage-pelican-token-file 'pelican://host/path:token.txt' |
--storage-pelican-debug |
Enable debug logging | --storage-pelican-debug true |
Dependencies
- snakemake-interface-storage-plugins
- pelicanfs - Python filesystem interface for Pelican
Development
To contribute or modify the plugin:
git clone https://github.com/PelicanPlatform/snakemake-storage-plugin-pelican.git
cd snakemake-storage-plugin-pelican
pip install -e ".[dev]"
Run tests:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snakemake_storage_plugin_pelican-0.1.1.tar.gz.
File metadata
- Download URL: snakemake_storage_plugin_pelican-0.1.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2b436dedaed0cae1edfb0d2d6c32085239eb0c2ec15bed86419d465b7bb5cb5
|
|
| MD5 |
9f5353bb31a28f5face77ee0613c211b
|
|
| BLAKE2b-256 |
f0852f3c1af8d3b2bade7168826d7fa95ad8945df043aec0918b9ed4a5028010
|
File details
Details for the file snakemake_storage_plugin_pelican-0.1.1-py3-none-any.whl.
File metadata
- Download URL: snakemake_storage_plugin_pelican-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1a8b34f2c55aabbfcc909c600bc7b1b2ff0b9efc966a43d84e1295f0eeeeb79
|
|
| MD5 |
f0b4b9140c8808c3f1a507445ddd7b8c
|
|
| BLAKE2b-256 |
290c9f8ec709f802cd18f48a62d231810d7c34b8f20cb4acdfd432f4aa76f4c0
|