Skip to main content

Python Client for Google Cloud Storage

Project description

This is a shared codebase for gcloud-aio-storage and gcloud-rest-storage

Latest PyPI Version (gcloud-aio-storage) Python Version Support (gcloud-aio-storage) Python Version Support (gcloud-rest-storage)

Installation

$ pip install --upgrade gcloud-{aio,rest}-storage

Usage

To upload a file, you might do something like the following:

import aiofiles
import aiohttp
from gcloud.aio.storage import Storage


async with aiohttp.ClientSession() as session:
    client = Storage(session=session)

    async with aiofiles.open('/path/to/my/file', mode="r") as f:
        contents = await f.read()
        status = await client.upload(
            'my-bucket-name',
            'path/to/gcs/folder',
            output,
        )
        print(status)

Note that there are multiple ways to accomplish the above, ie,. by making use of the Bucket and Blob convenience classes if that better fits your use-case.

Of course, the major benefit of using an async library is being able to parallelize operations like this. Since gcloud-aio-storage is fully asyncio-compatible, you can use any of the builtin asyncio method to perform more complicated operations:

my_files = {
    '/local/path/to/file.1': 'path/in/gcs.1',
    '/local/path/to/file.2': 'path/in/gcs.2',
    '/local/path/to/file.3': 'different/gcs/path/filename.3',
}

async with Storage() as client:
    # Prepare all our upload data
    uploads = []
    for local_name, gcs_name in my_files.items():
        async with aiofiles.open(local_name, mode="r") as f:
            contents = await f.read()
            uploads.append((gcs_name, contents))

    # Simultaneously upload all files
    await asyncio.gather(
        *[
            client.upload('my-bucket-name', path, file_) for path, file_ in uploads
        ]
    )

You can also refer smoke test for more info and examples.

Note that you can also let gcloud-aio-storage do its own session management, so long as you give us a hint when to close that session:

async with Storage() as client:
    # closes the client.session on leaving the context manager

# OR

client = Storage()
# do stuff
await client.close()  # close the session explicitly

File Encodings

In some cases, aiohttp needs to transform the objects returned from GCS into strings, eg. for debug logging and other such issues. The built-in await response.text() operation relies on chardet for guessing the character encoding in any cases where it can not be determined based on the file metadata.

Unfortunately, this operation can be extremely slow, especially in cases where you might be working with particularly large files. If you notice odd latency issues when reading your results, you may want to set your character encoding more explicitly within GCS, eg. by ensuring you set the contentType of the relevant objects to something suffixed with ; charset=utf-8. For example, in the case of contentType='application/x-netcdf' files exhibiting latency, you could instead set contentType='application/x-netcdf; charset=utf-8. See #172 for more info!

Emulators

For testing purposes, you may want to use gcloud-aio-storage along with a local GCS emulator. Setting the $STORAGE_EMULATOR_HOST environment variable to the address of your emulator should be enough to do the trick.

For example, using fsouza/fake-gcs-server, you can do:

docker run -d -p 4443:4443 -v $PWD/my-sample-data:/data fsouza/fake-gcs-server
export STORAGE_EMULATOR_HOST='0.0.0.0:4443'

Any gcloud-aio-storage requests made with that environment variable set will query fake-gcs-server instead of the official GCS API.

Note that some emulation systems require disabling SSL – if you’re using a custom http session, you may need to disable SSL verification.

Customization

This library mostly tries to stay agnostic of potential use-cases; as such, we do not implement any sort of retrying or other policies under the assumption that we wouldn’t get things right for every user’s situation.

As such, we recommend configuring your own policies on an as-needed basis. The backoff library can make this quite straightforward! For example, you may find it useful to configure something like:

class StorageWithBackoff(gcloud.aio.storage.Storage):
    @backoff.on_exception(backoff.expo, aiohttp.ClientResponseError,
                          max_tries=5, jitter=backoff.full_jitter)
    async def copy(self, *args: Any, **kwargs: Any):
        return await super().copy(*args, **kwargs)

    @backoff.on_exception(backoff.expo, aiohttp.ClientResponseError,
                          max_tries=10, jitter=backoff.full_jitter)
    async def download(self, *args: Any, **kwargs: Any):
        return await super().download(*args, **kwargs)

Contributing

Please see our contributing guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcloud-aio-storage-7.0.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

gcloud_aio_storage-7.0.1-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file gcloud-aio-storage-7.0.1.tar.gz.

File metadata

  • Download URL: gcloud-aio-storage-7.0.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.12 Linux/5.13.0-1017-aws

File hashes

Hashes for gcloud-aio-storage-7.0.1.tar.gz
Algorithm Hash digest
SHA256 3224b3db462352498e989ffb810e95cddff924376fab530e6edb2e609eadc814
MD5 ac0312d5cbefc637f0380715effe466c
BLAKE2b-256 808b27abf99c40a2aaeedb04f45f927bf87760aef749cede9aefa01a5cd2bb85

See more details on using hashes here.

File details

Details for the file gcloud_aio_storage-7.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for gcloud_aio_storage-7.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0442f709a718cbc8f7200f877c95682c70416e84b2b42d3af190f2aa99e62ab7
MD5 1f5adcb7b8617bc9a586b35d8c632f04
BLAKE2b-256 6d61fb90218344d1a8cd8201c7d26b5d9f34020cfbf5dbfd7a4c93eadc25227f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page