Skip to main content

wraps gsutil, a command-line interface to Google Cloud Storage.

Project description

gsutilwrap

gsutilwrap wraps Google Storage gsutil command-line interface in order to simplify the deployment and backup tasks related to Google Cloud Storage. It provides a set of data manipulation commands including copying, reading, writing and hashing stored data.

We primarly needed something simple that can still leverage mutli-threading, has decent progress output and implements robust pattern matching. Since gsutil CLI already provides all this functionality, we decided to wrap it. The wrapper adds type-annotated arguments and provides code inspection and autocomplete feature in an IDE such as PyCharm.

Additionally, since gsutil lacked copying of multiple patterns to multiple targets, we created this extra feature in gsutilwrap.

If you need to transfer data from/to Google Cloud Storage in the core of your application, we would recommend you to use the library google-cloud-storage provided by Google itself. That library is much more sophisticated in terms of features and would not incur you the overhead of authorizing and spawning a process for each operation. However, it lacks pattern matching (except for matching the prefixes) and you have to manage multi-threading and progress output yourself.

Usage

import pathlib

import gsutilwrap

# list
lst = gsutilwrap.ls(
    'gs://some-bucket/some-path/**/*.txt')

lst = gsutilwrap.ls_many(
    ['gs://some-bucket/some-path/**/*.txt',
     'gs://another-bucket/another-path/**/*.xml'],
    multithreaded=True)

# if you need a listing with size and update time, use long_ls
entries = gsutilwrap.long_ls(
    'gs://some-bucket/some-path/**/*.txt')

for entry in entries:
    print("File size and update time of {}: {} {}".format(
        entry.url, entry.size, entry.update_time))

# write/read text
gsutilwrap.write_text(
    url='gs://some-bucket/some-path/some-file.txt',
    text='some text')

text = gsutilwrap.read_text(
    url='gs://some-bucket/some-path/some-file.txt')

# write/read bytes
gsutilwrap.write_bytes(
    url='gs://some-bucket/some-path/some-file.bin',
    data=b'x\DE\xAD\xBE\xEF')

data = gsutilwrap.read_bytes(
    url='gs://some-bucket/some-path/some-file.bin')

# copy
gsutilwrap.copy(
    pattern="gs://some-bucket/some-path/*.txt",
    target="/some/dir")

gsutilwrap.copy_many_to_one(
    patterns=[
        "gs://some-bucket/some-path/*.txt",
        "gs://some-bucket/some-path/*.xml"
    ],
    target="/some/dir")

gsutilwrap.copy_many_to_many(
    patterns_targets=[
        ("gs://some-bucket/some-path/*.txt", "/some/dir"),
        ("gs://some-bucket/some-path/*.xml", "/some/other/dir")
    ])

# stat an object
stat = gsutilwrap.stat(
    url='gs://some-bucket/some-path/some-file.txt')
print("Modification time: {}".format(stat.file_mtime))
print("Size: {}".format(stat.content_length))
print("MD5: {}".format(stat.md5.hex()))

Installation

  • Create a virtual environment:
python3 -m venv venv3
  • Activate it:
source venv3/bin/activate
  • Install gsutilwrap with pip:
pip3 install gsutilwrap

Development

  • Check out the repository.
  • In the repository root, create the virtual environment:
python3 -m venv venv3
  • Activate the virtual environment:
source venv3/bin/activate
  • Install the development dependencies:
pip3 install -e .[dev]
  • We provide a set of live tests. The live tests need an existing bucket in the Google Cloud Storage. You need to set the URL prefix which will be used for all the live tests via the environment variable TEST_GSUTILWRAP_URL_PREFIX.

    Mind that the live tests will use Google Cloud resources for which you will be billed. Always check that no resources are used after the tests finished so that you don’t incur an unnecessary cost!

  • We use tox for testing and packaging the distribution. Assuming that the virtual environment has been activated and the development dependencies have been installed, run:

tox
  • We also provide a set of pre-commit checks that lint and check code for formatting. Run them locally from an activated virtual environment with development dependencies:
./precommit.py
  • The pre-commit script can also automatically format the code:
./precommit.py  --overwrite

Versioning

We follow Semantic Versioning. The version X.Y.Z indicates:

  • X is the major version (backward-incompatible),
  • Y is the minor version (backward-compatible), and
  • Z is the patch version (backward-compatible bug fix).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for gsutilwrap, version 1.1.2
Filename, size File type Python version Upload date Hashes
Filename, size gsutilwrap-1.1.2.tar.gz (11.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page