Skip to main content

A plugin for pipen to handle file metadata in Google Cloud Storage

Project description

pipen-gcs

A plugin for pipen to handle files in Google Cloud Storage.

[!NOTE] Since pipen v0.16.0, it introduced cloud support natively. See here for more information. However, when the pipeline working directory is a local path, but the input/output files are in the cloud, we need to handle the cloud files ourselves and in the job script. To avoid that, we can use this plugin to download the input files and upload the output files automatically.

[!NOTE] Also note that this plugin does not synchronize the meta files to the cloud storage; they are already handled by pipen when needed. This plugin only handles the input/output files when the working directory is a local path. When the pipeline output directory is a cloud path, the output files will be uploaded to the cloud storage automatically.

pipen-gcs

Installation

pip install -U pipen-gcs

Usage

from pipen import Proc, Pipen
import pipen_gcs  # Import and enable the plugin

class MyProc(Proc):
    input = "infile:file"
    input_data = ["gs://bucket/path/to/file"]
    output = "outfile:file:{{in.infile.name}}.out"
    # We can deal with the files as if they are local
    script = "cat {{in.infile}} > {{out.outfile}}"

class MyPipen(Pipen):
    starts = MyProc
    # input files/directories will be downloaded to /tmp
    # output files/directories will be generated in /tmp and then uploaded
    #   to the cloud storage
    plugin_opts = {"gcs_cache": "/tmp"}

if __name__ == "__main__":
    # The working directory is a local path
    # The output directory can be a local path, but if it is a cloud path,
    #   the output files will be uploaded to the cloud storage automatically
    MyPipen(workdir="./.pipen", outdir="./myoutput").run()

[!NOTE] When checking the meta information of the jobs, for example, whether a job is cached, the plugin will make pipen to use the cloud files.

Configuration

  • gcs_cache: The directory to save the cloud storage files.
  • gcs_loglevel: The log level for the plugin. Default is INFO.
  • gcs_logmax: The maximum number of files to log while syncing. Default is 5.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipen_gcs-1.1.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipen_gcs-1.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file pipen_gcs-1.1.0.tar.gz.

File metadata

  • Download URL: pipen_gcs-1.1.0.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ad35497363f231d69f2add67ddd08786b4cb92042d9d109829db446672ca67f3
MD5 9e483411a297aafbbe50150c9ae0b9f7
BLAKE2b-256 2d213c04835a6af719e3e435aff67bc94f2913b3189406c4a80aa0ade280b6e0

See more details on using hashes here.

File details

Details for the file pipen_gcs-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: pipen_gcs-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for pipen_gcs-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee053ab9d079abd8c1f7f3600c7f954d45a5c98959086255e5ddc89da054a6d4
MD5 521af6f16b47284eb97137502d2a5071
BLAKE2b-256 e2f1ada8b28db606c17c23e7e8b44b92cec1846935e5970c1408ccae93863d33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page