Skip to main content

A plugin for pipen to handle file metadata in Google Cloud Storage

Project description

pipen-gcs

A plugin for pipen to handle files in Google Cloud Storage.

[!NOTE] Since pipen v0.16.0, it introduced cloud support natively. See here for more information. However, when the pipeline working directory is a local path, but the input/output files are in the cloud, we need to handle the cloud files ourselves and in the job script. To avoid that, we can use this plugin to download the input files and upload the output files automatically.

[!NOTE] Also note that this plugin does not synchronize the meta files to the cloud storage; they are already handled by pipen when needed. This plugin only handles the input/output files when the working directory is a local path. When the pipeline output directory is a cloud path, the output files will be uploaded to the cloud storage automatically.

pipen-gcs

Installation

pip install -U pipen-gcs

Usage

from pipen import Proc, Pipen
import pipen_gcs  # Import and enable the plugin

class MyProc(Proc):
    input = "infile:file"
    input_data = ["gs://bucket/path/to/file"]
    output = "outfile:file:{{in.infile.name}}.out"
    # We can deal with the files as if they are local
    script = "cat {{in.infile}} > {{out.outfile}}"

class MyPipen(Pipen):
    starts = MyProc
    # input files/directories will be downloaded to /tmp
    # output files/directories will be generated in /tmp and then uploaded
    #   to the cloud storage
    plugin_opts = {"gcs_cache": "/tmp"}

if __name__ == "__main__":
    # The working directory is a local path
    # The output directory can be a local path, but if it is a cloud path,
    #   the output files will be uploaded to the cloud storage automatically
    MyPipen(workdir="./.pipen", outdir="./myoutput").run()

[!NOTE] When checking the meta information of the jobs, for example, whether a job is cached, the plugin will make pipen to use the cloud files.

Configuration

  • gcs_cache: The directory to save the cloud storage files.
  • gcs_loglevel: The log level for the plugin. Default is INFO.
  • gcs_logmax: The maximum number of files to log while syncing. Default is 5.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipen_gcs-1.1.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipen_gcs-1.1.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file pipen_gcs-1.1.1.tar.gz.

File metadata

  • Download URL: pipen_gcs-1.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.1.tar.gz
Algorithm Hash digest
SHA256 fe10a08faa1f1699e62cb282475bb028f735d05e6e7ecd0cc79e453c9e6b69df
MD5 fd5b90c9355486c3aa4c899f043f2f9e
BLAKE2b-256 57cdfd2a3832ba1c5913599cb41ba3b386e0892131a972f58881322405a5a17d

See more details on using hashes here.

File details

Details for the file pipen_gcs-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: pipen_gcs-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb90542f730c73e6b266e5fd2eaf2d6f1a1f6aa7e995ca536b587fd7cadde29a
MD5 9c8a76bb4d19bac3f8423c08852c671a
BLAKE2b-256 6aecaece96e89b9479dcefa371903c3bda823abfc967efbd5493db39567a0417

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page