Skip to main content

Simple API to copy files to and from Google Cloud Storage

Project description

gcs-uri

Simple API to copy files to and from Google Cloud Storage

PyPI Version

Installation

pip install gcs-uri

Usage

gcs-uri exposes the following functions as its main public API

  • copy_file
  • copy_dir
  • copy_files

These functions do exactly what they sound like they do.

copy_file will copy a source file (either a local file or a remote blob in GCS) to destination file (either a local file or remote blob in GCS).

copy_dir will recursively copy the contents of a directory (either a local directory or a remote "directory" in GCS) to a destination directory (either a local directory or a remote "directory" in GCS)

copy_files will copy a list of source files (either local files or remote blobs in GCS or a mix of local files/remote blobs) to a corresponding set of destination files (either local files or remote blobs in GCS of a mix of local files/remote blobs)

If the second argument to copy_files is of type str | Path | Blob (as opposed to a Sequence), then this argument is treated like a directory and each of the source files are "flattened" (i.e. folder delimiters are removed) and copied under the destintation directory.

The idea being that you can pass just about any object to these functions and the functions will figures how to do the copying.

Examples

Local file -> local file

In this case copy_file behaves just like shutil.copy2 or cp, copying the source file to the destination file locally.

src = '/my/src/file.txt'
dst = '/my/dst/file.txt'

copy_file(src, dst)

src and dst can also be pathlib.Path objects:

from pathlib import Path

src = Path('/my/src/file.txt')
dst = Path('/my/dst/file.txt')

copy_file(src, dst)

Local dir -> local dir

In this case copy_dir behaves just like shutil.copytree (or somewhat like rsync, but copy_dir will "re-copy" all files to the destination whether they exist in the the destination or not).

src = '/my/src'
dst = '/my/dst'

copy_dir(src, dst)

# if there was a file `/my/src/a/b.txt` after `copy_dir`
# there would then be a file `/my/dst/a/b.txt`

The source and destination can include or omit a trailing slash and the results are the same as above.

Local file -> remote file (upload)

To copy a file to a google cloud bucket, barely anything has to change, the destination should simply be a google storage URI:

src = '/my/src/file.txt'
dst = 'gs://my-bkt/dst/file.txt'

copy_file(src, dst)

If you would like gcs-uri to use a particular Google Storage Client, this can be provided as a keyword(-only) argument (the same applies to copy_dir):

from google.cloud import storage

client = storage.Client()

src = '/my/src/file.txt'
dst = 'gs://my-bkt/dst/file.txt'

copy_file(src, dst, client=client)

If no client is provided and either of the source or destinations (or both) are determined to represent a remote location then gcs-uri will try to instantiate a client by calling storage.Client().

Note, we can provided gcs-uri with "richer" objects (instead of just strings):

from pathlib import Path
from google.cloud import storage

client = storage.Client()

src = Path('/my/src/file.txt')
dst = storage.Blob.from_string('gs://my-bkt/dst/file.txt', client=client)

copy_file(src, dst)

Local dir -> remote dir (upload)

The concepts from the previous sections apply here:

src = '/my/src'
dst = 'gs://my-bkt/dst'

copy_dir(src, dst)

# if there was a file `/my/src/a/b.txt` after `copy_dir`
# there would then be a blob `gs://my-bkt/dst/a/b.txt`

Remote file -> local file (download)

src = 'gs://my-bkt/src/file.txt'
dst = '/my/dst/file.txt'

copy_file(src, dst)

Remote dir -> local dir (download)

src = 'gs://my-bkt/src'
dst = '/my/dst'

copy_dir(src, dst)

Remote file -> remote file (transfer)

src = 'gs://my-bkt/src/file.txt'
dst = 'gs://my-other-bkt/dst/file.txt'

copy_file(src, dst)

Remote dir -> remote dir (transfer)

src = 'gs://my-bkt/src'
dst = 'gs://my-other-bkt/dst'

copy_dir(src, dst)

List of local files -> list of remote files

srcs = ['/my/src/file1.txt', '/my/src/file2.txt']
dsts = ['gs://my-bkt/dst/file1.txt', 'gs://my-bkt/dst/file2.txt']

copy_files(srcs, dsts)
# copies: /my/src/file1.txt -> gs://my-bkt/dst/file1.txt
# copies: /my/src/file2.txt -> gs://my-bkt/dst/file2.txt

List of local files -> remote dir

srcs = ['/my/src/file1.txt', '/my/src/file2.txt']
dst = 'gs://my-bkt/dst'

copy_files(srcs, dst)
# copies: /my/src/file1.txt -> gs://my-bkt/dst/my-src-file1.txt
# copies: /my/src/file2.txt -> gs://my-bkt/dst/my-src-file2.txt

API

# src/gcs_uri.py

def copy_file(
    src: str | Path | Blob,
    dst: str | Path | Blob,
    *,
    client: Client | None = None,
    quiet: bool = False,
) -> None:
    """Copy a single file.

    If `src` and `dst` are both determined to be local files then `client` is ignored.
    """

def copy_dir(
    src: str | Path | Blob,
    dst: str | Path | Blob,
    *,
    client: Client | None = None,
    quiet: bool = False,
) -> None:
    """Copy a directory (recursively).

    If `src` and `dst` are both determined to be local directories
    then `client` is ignored.
    """

def copy_files(
    srcs: Sequence[str | Path | Blob],
    dsts: str | Path | Blob | Sequence[str | Path | Blob],
    *,
    client: Client | None = None,
    quiet: bool = False,
) -> None:
    """Copy a list of files.

    If `dsts` is a `str | Path | Blob` it is treated as a directory
    and each of the files in `srcs` will have its name "flattened" and will be
    copied under `dsts`.

    if `dsts` is a `Sequence[str | Path | Blob]` it is zipped with `srcs`, i.e.
    each file in `srcs` is copied to its corresponding entry in `dsts`.
    """

Tests

This package comes with some basic end-to-end (e2e) tests. They require an active google cloud project with the google storage API enabled.

To help with running them there is a utility script in the root of this repo: run_e2e_tests.py.

usage: run_e2e_tests.py [-h] [-v] [-c GOOGLE_APPLICATION_CREDENTIALS]
                        [-u TEST_STORAGE_URI]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -c GOOGLE_APPLICATION_CREDENTIALS, --google-application-credentials GOOGLE_APPLICATION_CREDENTIALS
                        Google cloud service account to use.
  -u TEST_STORAGE_URI, --test-storage-uri TEST_STORAGE_URI
                        Google storage uri to use when running e2e tests.

This script requires you to provided a service account json file as we'll as a URI to a location in google cloud which the tests will use to copy blobs to/from. (IMPORTANT: all blobs at and beneath the location you specifify will be removed - the bucket itself will not be removed).

So, run the e2e tests with something like:

python -m run_e2e_tests -c "path/to/service-account.json" -u "gs://my-bkt/gcs-uri-tests"

Contributing

  1. Have or install a recent version of poetry (version >= 1.1)
  2. Fork the repo
  3. Setup a virtual environment (however you prefer)
  4. Run poetry install
  5. Run pre-commit install
  6. Add your changes (adding/updating tests is always nice too)
  7. Commit your changes + push to your fork
  8. Open a PR

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcs-uri-1.3.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

gcs_uri-1.3.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file gcs-uri-1.3.0.tar.gz.

File metadata

  • Download URL: gcs-uri-1.3.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.13.0-1022-azure

File hashes

Hashes for gcs-uri-1.3.0.tar.gz
Algorithm Hash digest
SHA256 e9fef467bb4a2a47af3030728ab80c260765a4c16fcca0ab2ea89070ba83db9f
MD5 27667f916f5e2a719ba15cd7a9392f83
BLAKE2b-256 cc637f89818a9ff82f6b95c88c60d6b5d75ffa828284dc36b408e785aed76890

See more details on using hashes here.

File details

Details for the file gcs_uri-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: gcs_uri-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.4 Linux/5.13.0-1022-azure

File hashes

Hashes for gcs_uri-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c79cf55d0b003bf3c82f2ed67009930be5895523a9599c04ab3c56fb84076fa6
MD5 79eb7e45109f12e45c824e52d4b44066
BLAKE2b-256 2497c92a423b81b33b135ced6c1e864e799a4690347915446a4adf594e1c421e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page