Skip to main content

Library and utilities for interfacing with S3

Project description

S3Lib

Python library and collection of command line programs for interfacing with AWS S3. Uses buffering and fixed memory usage, where possible, so that operations on large buckets and objects are safe and easy.

Features

  • Memory-efficient streaming for large objects
  • Batch operations for large buckets
  • Support for custom S3-compatible endpoints
  • Simple credential management
  • Both library and CLI interfaces

Installation

pip install s3lib

Configuration

S3Lib supports multiple authentication methods (in order of precedence):

  1. Command-line argument: Use --creds <path> to specify a credentials file
  2. Environment variables: Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  3. Credentials file: Create ~/.s3 with your credentials (default)

Credentials File Format

Create a file at ~/.s3 (or any path you specify) with:

<AWS_ACCESS_KEY_ID>
<AWS_SECRET_ACCESS_KEY>

Example:

AKIAIOSFODNN7EXAMPLE
wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Command Line Utilities

s3ls - List buckets or objects

List all buckets:

s3ls

List objects in a bucket:

s3ls mybucket

List with prefix filter:

s3ls mybucket --prefix logs/2024/

List with custom fields:

s3ls mybucket --fields Key Size LastModified

Available fields: Key, LastModified, ETag, Size, StorageClass

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --output FILE - Write output to file
  • --creds FILE - Path to credentials file
  • --mark MARKER - Start listing from this key
  • --prefix PREFIX - Filter by prefix
  • --batch SIZE - Batch size for API calls (default: 1000)

s3get - Download objects

Download an object:

s3get mybucket myfile.txt --output local-file.txt

Download to stdout:

s3get mybucket logs/app.log | grep ERROR

Download multiple objects:

s3get mybucket file1.txt file2.txt --output combined.txt

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --output FILE - Write output to file (default: stdout)
  • --creds FILE - Path to credentials file
  • --range START-END - Fetch only a byte range (e.g. 0-499, 500-, -999)

s3put - Upload objects

Upload a file:

s3put mybucket remote-file.txt local-file.txt

Upload from stdin:

echo "Hello World" | s3put mybucket hello.txt

Upload with custom headers:

s3put mybucket file.txt local.txt --header "Content-Type:text/plain" --header "Cache-Control:max-age=3600"

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --creds FILE - Path to credentials file
  • --header KEY:VALUE - Add custom HTTP headers (repeatable)

s3head - Get object metadata

Get metadata for objects:

s3head mybucket file1.txt file2.txt

Get metadata in JSON format:

s3head mybucket file.txt --json

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --creds FILE - Path to credentials file
  • --json - Output in JSON format

s3cp - Copy objects

Copy object within or between buckets:

s3cp source-bucket source-key dest-bucket dest-key

Copy with custom metadata:

s3cp mybucket old.txt mybucket new.txt --header "Content-Type:application/json"

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --creds FILE - Path to credentials file
  • --header KEY:VALUE - Set metadata headers (repeatable)

s3rm - Delete objects

Delete objects:

s3rm mybucket file1.txt file2.txt

Delete with verbose output:

s3rm mybucket file.txt --verbose

Batch delete with custom batch size:

s3rm mybucket file*.txt --batch 100

Options:

  • --host HOST - Custom S3 endpoint hostname
  • --port PORT - Custom port
  • --creds FILE - Path to credentials file
  • -v, --verbose - Show files as they are deleted
  • --batch SIZE - Batch size for delete operations (default: 500)

s3sign - Sign S3 forms

Sign a policy document for browser-based uploads:

s3sign policy.json

This outputs the base64-encoded policy and signature.

Options:

  • --creds FILE - Path to credentials file

Python Library API

Connection Lifecycle

Connection must be used as a context manager. Calling methods outside of a with block raises ConnectionLifecycleError. The connection is established lazily on first use and closed when the with block exits.

from s3lib import Connection

access_id = "AKIAIOSFODNN7EXAMPLE"
secret = b"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

with Connection(access_id, secret) as s3:
    for bucket in s3.list_buckets():
        print(bucket)

Unconsumed responses: If get_object2 returns a stream and the Connection context exits before that stream is consumed, ConnectionLifecycleError is raised. Always close the stream before letting the connection exit:

with Connection(access_id, secret) as s3:
    stream, headers = s3.get_object2("mybucket", "file.txt")
    with stream:               # stream must be closed inside the connection block
        data = stream.read()

Downloading Objects

get_object2 (recommended)

get_object2 returns (S3ByteStream, headers) on success, or (None, headers) when a conditional request produces no body (304 Not Modified or 412 Precondition Failed).

with Connection(access_id, secret) as s3:
    # Simple download
    stream, headers = s3.get_object2("mybucket", "file.txt")
    with stream:
        data = stream.read()

    # Conditional download — skip if unchanged (caching)
    stream, headers = s3.get_object2("mybucket", "file.txt", if_none_match=cached_etag)
    if stream is None:
        pass  # 304 Not Modified — use cached copy
    else:
        with stream:
            data = stream.read()

    # Conditional download — only if ETag still matches
    stream, headers = s3.get_object2("mybucket", "file.txt", if_match=expected_etag)
    if stream is None:
        pass  # 412 Precondition Failed — object has changed
    else:
        with stream:
            data = stream.read()

get_object (low-level)

get_object returns the raw HTTPResponse. Conditional responses (304, 412) are returned as status codes — no exception is raised.

with Connection(access_id, secret) as s3:
    # Conditional download — check status to detect unchanged object
    response = s3.get_object("mybucket", "file.txt", if_none_match=cached_etag)
    if response.status == 304:
        pass  # Not Modified — use cached copy
    else:
        data = response.read()

    # Conditional download — check status to detect changed object
    response = s3.get_object("mybucket", "file.txt", if_match=expected_etag)
    if response.status == 412:
        pass  # Precondition Failed — object has changed
    else:
        data = response.read()

S3ByteStream

get_object2 returns an S3ByteStream context manager. It must always be used with with:

  • Full consumption: when .read() returns b"" (EOF), the underlying HTTP connection is kept alive and returned to a healthy state for reuse.
  • Early exit: when the with block exits before the stream is exhausted, the underlying socket is closed.
# Incremental read — stream a large object to disk
with Connection(access_id, secret) as s3:
    stream, headers = s3.get_object2("mybucket", "largefile.bin")
    with stream, open("local-large.bin", "wb") as f:
        while chunk := stream.read(65536):
            f.write(chunk)

Byte Range Fetching

Request only a portion of an object using byte_range=(start, end). Both positions are inclusive, 0-based byte offsets. Either can be None:

with Connection(access_id, secret) as s3:
    # First 500 bytes
    stream, headers = s3.get_object2("mybucket", "file.bin", byte_range=(0, 499))
    with stream:
        data = stream.read()

    # From byte 4096 to end of object
    stream, headers = s3.get_object2("mybucket", "file.bin", byte_range=(4096, None))
    with stream:
        tail = stream.read()

Uploading Objects

put_object2 (recommended)

put_object2 returns a PutResult TypedDict on success, or None when a conditional check fails — no exception to catch.

Field Type Description
etag str ETag of the stored object; use with if_match for future consistency checks
version_id str | None Version ID if bucket versioning is enabled
checksum str | None Server-confirmed checksum if one was requested
with Connection(access_id, secret) as s3:
    # Upload bytes
    result = s3.put_object2("mybucket", "file.txt", b"Hello World")
    print(result['etag'])

    # Upload from an open file or BytesIO
    with open("local.bin", "rb") as f:
        result = s3.put_object2("mybucket", "remote.bin", f)

    # Create-only — None means the object already existed, upload was skipped
    result = s3.put_object2("mybucket", "file.txt", b"data", if_none_match=True)
    if result is None:
        pass  # object already exists

    # Optimistic locking — None means a concurrent write changed the object
    result = s3.put_object2("mybucket", "file.txt", b"updated", if_match=old_etag)
    if result is None:
        pass  # ETag changed, retry with a fresh read

put_object (low-level)

put_object returns a raw (status, headers) tuple. Conditional failures raise PreconditionFailed.

from s3lib import Connection, PreconditionFailed

with Connection(access_id, secret) as s3:
    # Create-only upload
    try:
        s3.put_object("mybucket", "file.txt", b"data", if_none_match=True)
    except PreconditionFailed:
        pass  # object already exists

    # Optimistic locking
    try:
        s3.put_object("mybucket", "file.txt", b"updated", if_match=old_etag)
    except PreconditionFailed:
        pass  # ETag changed, retry with a fresh read

Other Operations

with Connection(access_id, secret) as s3:
    # List buckets
    for bucket in s3.list_buckets():
        print(bucket)

    # List objects (keys only)
    for key in s3.list_bucket("mybucket"):
        print(key)

    # List objects with metadata
    for obj in s3.list_bucket2("mybucket"):
        print(obj['Key'], obj['Size'], obj['LastModified'])

    # Object metadata
    headers = s3.head_object("mybucket", "file.txt")

    # Copy object within or between buckets
    s3.copy_object("bucket1", "src.txt", "bucket2", "dst.txt")

    # Delete one object
    s3.delete_object("mybucket", "file.txt")

    # Bulk delete
    for key, ok in s3.delete_objects("mybucket", ["a.txt", "b.txt"]):
        print(key, ok)

Connection Options

# Custom endpoint (e.g. MinIO or a specific AWS region)
with Connection(access_id, secret, host="s3.us-west-2.amazonaws.com") as s3:
    pass

# Custom port
with Connection(access_id, secret, port=9000) as s3:
    pass

# Connection timeout (seconds)
with Connection(access_id, secret, conn_timeout=60) as s3:
    pass

Development

See MAINTAINING.md for development and maintenance instructions.

Running Tests

# Install development dependencies
make dev

# Run tests, type checking, and linting
make check

# Run tests with coverage report
make test

# Type check only
make typecheck

# Lint only
make lint

License

MIT License - See setup.py for details.

Author

Andrew Thomson (athomsonguy@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3lib-2.3.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3lib-2.3.0-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file s3lib-2.3.0.tar.gz.

File metadata

  • Download URL: s3lib-2.3.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for s3lib-2.3.0.tar.gz
Algorithm Hash digest
SHA256 5e8fa58188e2b6b913d0566205d3e4e614283a91c01c61cc5241270eeb9f3146
MD5 1aea4d5cc3761a4f544af93e4ab5331c
BLAKE2b-256 d51feb7a512f5de61d7c3ac04b50cbbc1f9ff14bab3ea058fd7955fa0466b951

See more details on using hashes here.

File details

Details for the file s3lib-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: s3lib-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for s3lib-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e1f02538ac9dad002102a1c7cc94fb3a68be0c0b1134eb48f0322261042bd01
MD5 fedb043ac0f59ec7126a52517579826f
BLAKE2b-256 2439e8c1ecc8604156d9e5db2d628484444399ae348fa2dcdb23b7aeed61e5fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page