Skip to main content

Async S3 client using httpx & anyio

Project description

handtruck

The simple module for putting and getting object from Amazon S3 compatible endpoints.

Installation

pip install handtruck

Usage

from http import HTTPStatus

from httpx import AsyncClient
from handtruck import S3Client


client = S3Client(
    url="http://s3-url",
    client=AsyncClient(),
    access_key_id="key-id",
    secret_access_key="hackme",
    region="us-east-1"
)

# Upload str object to bucket "bucket" and key "str"
resp = await client.put("bucket/str", "hello, world")
assert resp.status_code == HTTPStatus.OK

# Upload bytes object to bucket "bucket" and key "bytes"
resp = await client.put("bucket/bytes", b"hello, world")
assert resp.status_code == HTTPStatus.OK

# Upload AsyncIterable to bucket "bucket" and key "iterable"
async def gen():
    yield b'some bytes'

resp = await client.put("bucket/file", gen())
assert resp.status_code == HTTPStatus.OK

# Upload file to bucket "bucket" and key "file"
resp = await client.put_file("bucket/file", "/path_to_file" )
assert resp.status_code == HTTPStatus.OK

# Check object exists using bucket+key
resp = await client.head("bucket/key")
assert resp.status_code == HTTPStatus.OK

# Get object by bucket+key
resp = await client.get("bucket/key")
data = resp.content

# Delete object using bucket+key
resp = await client.delete("bucket/key")
assert resp == HTTPStatus.NO_CONTENT

# List objects by prefix
async for result in client.list_objects_v2("bucket/", prefix="prefix"):
    # Each result is a list of metadata objects representing an object
    # stored in the bucket.
    do_work(result)

Bucket may be specified as subdomain or in object name:

import httpx
from handtruck import S3Client


client = S3Client(url="http://bucket.your-s3-host",
                  client=httpx.AsyncClient())
resp = await client.put("key", gen())
...

client = S3Client(url="http://your-s3-host",
                  client=httpx.AsyncClient())
resp = await client.put("bucket/key", gen())
...

client = S3Client(url="http://your-s3-host/bucket",
                  client=httpx.AsyncClient())
resp = await client.put("key", gen())
...

Auth may be specified with keywords or in URL:

import httpx
from handtruck import S3Client

client_credentials_as_kw = S3Client(
    url="http://your-s3-host",
    access_key_id="key_id",
    secret_access_key="access_key",
    client=httpx.AsyncClient(),
)

client_credentials_in_url = S3Client(
    url="http://key_id:access_key@your-s3-host",
    client=httpx.AsyncClient(),
)

Credentials

By default S3Client trying to collect all available credentials from keyword arguments like access_key_id= and secret_access_key=, after that from the username and password from passed url argument, so the nex step is environment variables parsing and the last source for collection is the config file.

You can pass credentials explicitly using handtruck.credentials module.

handtruck.credentials.StaticCredentials

import httpx
from handtruck import S3Client
from handtruck.credentials import StaticCredentials

credentials = StaticCredentials(
    access_key_id='aaaa',
    secret_access_key='bbbb',
    region='us-east-1',
)
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

handtruck.credentials.URLCredentials

import httpx
from handtruck import S3Client
from handtruck.credentials import URLCredentials

url = "http://key@hack-me:your-s3-host"
credentials = URLCredentials(url, region="us-east-1")
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

handtruck.credentials.EnvironmentCredentials

import httpx
from handtruck import S3Client
from handtruck.credentials import EnvironmentCredentials

credentials = EnvironmentCredentials(region="us-east-1")
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

handtruck.credentials.ConfigCredentials

Using user config file:

import httpx
from handtruck import S3Client
from handtruck.credentials import ConfigCredentials


credentials = ConfigCredentials()   # Will be used ~/.aws/credentials config
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

Using the custom config location:

import httpx
from handtruck import S3Client
from handtruck.credentials import ConfigCredentials


credentials = ConfigCredentials("~/.my-custom-aws-credentials")
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

handtruck.credentials.merge_credentials

This function collect all passed credentials instances and return a new one which contains all non-blank fields from passed instances. The first argument has more priority.

import httpx
from handtruck import S3Client
from handtruck.credentials import (
    ConfigCredentials, EnvironmentCredentials, merge_credentials
)

credentials = merge_credentials(
    EnvironmentCredentials(),
    ConfigCredentials(),
)
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
    credentials=credentials,
)

handtruck.credentials.MetadataCredentials

Trying to get credentials from the metadata service:

import httpx
from handtruck import S3Client
from handtruck.credentials import MetadataCredentials

credentials = MetadataCredentials()

# start refresh credentials from metadata server
await credentials.start()
client = S3Client(
    url="http://your-s3-host",
    client=httpx.AsyncClient(),
)
await credentials.stop()

Multipart upload

For uploading large files multipart uploading can be used. It allows you to asynchronously upload multiple parts of a file to S3. S3Client handles retries of part uploads and calculates part hash for integrity checks.

import httpx
from handtruck import S3Client


client = S3Client(url="http://your-s3-host", client=httpx.AsyncClient())
await client.put_file_multipart(
    "test/bigfile.csv",
    headers={
        "Content-Type": "text/csv",
    },
    workers_count=8,
)

Parallel download to file

S3 supports GET requests with Range header. It's possible to download objects in parallel with multiple connections for speedup. S3Client handles retries of partial requests and makes sure that file won't be changed during download with ETag header. If your system supports pwrite syscall (Linux, macOS, etc.) it will be used to write simultaneously to a single file. Otherwise, each worker will have own file which will be concatenated after downloading.

import httpx
from handtruck import S3Client


client = S3Client(url="http://your-s3-host", client=httpx.AsyncClient())

await client.get_file_parallel(
    "dump/bigfile.csv",
    "/home/user/bigfile.csv",
    workers_count=8,
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

handtruck-0.0.1.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

handtruck-0.0.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file handtruck-0.0.1.tar.gz.

File metadata

  • Download URL: handtruck-0.0.1.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.0 Linux/6.17.8-300.fc43.x86_64

File hashes

Hashes for handtruck-0.0.1.tar.gz
Algorithm Hash digest
SHA256 753701d1ef4e5113cc9955ca19b3d359a83fce772a0911212e85fe574cdb67ba
MD5 a9a0a2d2c874a044fcfb2430e6214434
BLAKE2b-256 dc83fbb973d8de4ba5e08ede405820bfa576d3339e44bdc448ae26f36af96809

See more details on using hashes here.

File details

Details for the file handtruck-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: handtruck-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.0 Linux/6.17.8-300.fc43.x86_64

File hashes

Hashes for handtruck-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c27d20a0fdfa9dc3fd9de890ed2dcb6f37235330b583f668884cb591c3a909b0
MD5 f78cbe175e1157e625c5314012264261
BLAKE2b-256 547ec57fad5855b6c5a9150b867eea1c5d48c9ea9677a5ee110b34e98e7add16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page