A plugin for pipen to handle file metadata in Google Cloud Storage
Project description
pipen-gcs
A plugin for pipen to handle files in Google Cloud Storage.
[!NOTE] Since pipen v0.16.0, it introduced cloud support natively. See here for more information. However, when the pipeline working directory is a local path, but the input/output files are in the cloud, we need to handle the cloud files ourselves and in the job script. To avoid that, we can use this plugin to download the input files and upload the output files automatically.
[!NOTE] Also note that this plugin does not synchronize the meta files to the cloud storage; they are already handled by pipen when needed. This plugin only handles the input/output files when the working directory is a local path. When the pipeline output directory is a cloud path, the output files will be uploaded to the cloud storage automatically.
Installation
pip install -U pipen-gcs
Usage
from pipen import Proc, Pipen
import pipen_gcs # Import and enable the plugin
class MyProc(Proc):
input = "infile:file"
input_data = ["gs://bucket/path/to/file"]
output = "outfile:file:{{in.infile.name}}.out"
# We can deal with the files as if they are local
script = "cat {{in.infile}} > {{out.outfile}}"
class MyPipen(Pipen):
starts = MyProc
# input files/directories will be downloaded to /tmp
# output files/directories will be generated in /tmp and then uploaded
# to the cloud storage
plugin_opts = {"gcs_cache": "/tmp"}
if __name__ == "__main__":
# The working directory is a local path
# The output directory can be a local path, but if it is a cloud path,
# the output files will be uploaded to the cloud storage automatically
MyPipen(workdir="./.pipen", outdir="./myoutput").run()
[!NOTE] When checking the meta information of the jobs, for example, whether a job is cached, the plugin will make
pipento use the cloud files.
Configuration
gcs_cache: The directory to save the cloud storage files.gcs_loglevel: The log level for the plugin. Default isINFO.gcs_logmax: The maximum number of files to log while syncing. Default is5.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipen_gcs-1.1.2.tar.gz.
File metadata
- Download URL: pipen_gcs-1.1.2.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6305fe7595ab89c088638c7f3e5536df0df87faf7149bb42c02ee9304fefccb
|
|
| MD5 |
5d969e6d21afc4b9be4bbc12b3584e3e
|
|
| BLAKE2b-256 |
1b28bf7ca0567c938d4b08f8e3281b51a9c92b8185028eb86b52b826f2287460
|
File details
Details for the file pipen_gcs-1.1.2-py3-none-any.whl.
File metadata
- Download URL: pipen_gcs-1.1.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1cff6498d8bce049c628b28a5ba5c43948a103ce5c9adb1da51a5dde1fe40ad
|
|
| MD5 |
a7d9d07cb979f2ce960f0296db8569ff
|
|
| BLAKE2b-256 |
8ab1ff5de1081254307c5f6e024f340e7b6a536b6040a0277efe11a312912739
|