Skip to main content

A plugin for pipen to handle file metadata in Google Cloud Storage

Project description

pipen-gcs

A plugin for pipen to handle files in Google Cloud Storage.

[!NOTE] Since pipen v0.16.0, it introduced cloud support natively. See here for more information. However, when the pipeline working directory is a local path, but the input/output files are in the cloud, we need to handle the cloud files ourselves and in the job script. To avoid that, we can use this plugin to download the input files and upload the output files automatically.

[!NOTE] Also note that this plugin does not synchronize the meta files to the cloud storage; they are already handled by pipen when needed. This plugin only handles the input/output files when the working directory is a local path. When the pipeline output directory is a cloud path, the output files will be uploaded to the cloud storage automatically.

pipen-gcs

Installation

pip install -U pipen-gcs

Usage

from pipen import Proc, Pipen
import pipen_gcs  # Import and enable the plugin

class MyProc(Proc):
    input = "infile:file"
    input_data = ["gs://bucket/path/to/file"]
    output = "outfile:file:{{in.infile.name}}.out"
    # We can deal with the files as if they are local
    script = "cat {{in.infile}} > {{out.outfile}}"

class MyPipen(Pipen):
    starts = MyProc
    # input files/directories will be downloaded to /tmp
    # output files/directories will be generated in /tmp and then uploaded
    #   to the cloud storage
    plugin_opts = {"gcs_cache": "/tmp"}

if __name__ == "__main__":
    # The working directory is a local path
    # The output directory can be a local path, but if it is a cloud path,
    #   the output files will be uploaded to the cloud storage automatically
    MyPipen(workdir="./.pipen", outdir="./myoutput").run()

[!NOTE] When checking the meta information of the jobs, for example, whether a job is cached, the plugin will make pipen to use the cloud files.

Configuration

  • gcs_cache: The directory to save the cloud storage files.
  • gcs_loglevel: The log level for the plugin. Default is INFO.
  • gcs_logmax: The maximum number of files to log while syncing. Default is 5.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipen_gcs-1.1.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipen_gcs-1.1.2-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file pipen_gcs-1.1.2.tar.gz.

File metadata

  • Download URL: pipen_gcs-1.1.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.2.tar.gz
Algorithm Hash digest
SHA256 e6305fe7595ab89c088638c7f3e5536df0df87faf7149bb42c02ee9304fefccb
MD5 5d969e6d21afc4b9be4bbc12b3584e3e
BLAKE2b-256 1b28bf7ca0567c938d4b08f8e3281b51a9c92b8185028eb86b52b826f2287460

See more details on using hashes here.

File details

Details for the file pipen_gcs-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: pipen_gcs-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c1cff6498d8bce049c628b28a5ba5c43948a103ce5c9adb1da51a5dde1fe40ad
MD5 a7d9d07cb979f2ce960f0296db8569ff
BLAKE2b-256 8ab1ff5de1081254307c5f6e024f340e7b6a536b6040a0277efe11a312912739

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page