Skip to main content

Extend boto3 with Tigris-specific features like snapshots and bucket forking

Project description

tigris-boto3-ext

CI codecov Python Version PyPI version License

Extend boto3 with Tigris-specific features like snapshots and bucket forking, while maintaining full boto3 compatibility.

Features

  • Bundle API: Fetch thousands of objects in a single request as a streaming tar archive — designed for ML training workloads
  • Snapshot Support: Create, list, and read from bucket snapshots
  • Bucket Forking: Create forked buckets from existing buckets or snapshots
  • Object Rename: Rename objects in place without rewriting their data
  • Multiple Usage Patterns: Context managers, decorators, helper functions, or wrapper client
  • Zero Configuration: Works with existing boto3 code
  • Type Safe: Full type hints for IDE support
  • Pythonic API: Uses familiar Python patterns

Installation

pip install tigris-boto3-ext

Usage Patterns

1. Context Managers (Recommended)

Enable Snapshots for Bucket Creation

from tigris_boto3_ext import TigrisSnapshotEnabled

with TigrisSnapshotEnabled(s3_client):
    s3_client.create_bucket(Bucket='my-snapshot-bucket')

Work with Snapshots

from tigris_boto3_ext import TigrisSnapshot

# List snapshots for a bucket
with TigrisSnapshot(s3_client, 'my-bucket'):
    snapshots = s3_client.list_buckets()

# Read objects from a specific snapshot
with TigrisSnapshot(s3_client, 'my-bucket', snapshot_version='12345'):
    obj = s3_client.get_object(Bucket='my-bucket', Key='file.txt')
    objects = s3_client.list_objects_v2(Bucket='my-bucket')

Create Forked Buckets

from tigris_boto3_ext import TigrisFork

# Fork from current state
with TigrisFork(s3_client, 'source-bucket'):
    s3_client.create_bucket(Bucket='forked-bucket')

# Fork from specific snapshot
with TigrisFork(s3_client, 'source-bucket', snapshot_version='12345'):
    s3_client.create_bucket(Bucket='forked-from-snapshot')

Rename Objects

Tigris implements rename as a copy_object request plus the X-Tigris-Rename: true header — no data is rewritten, only the key changes. Keep the context tight so unrelated copy_object calls are not turned into renames.

from tigris_boto3_ext import TigrisRename

with TigrisRename(s3_client):
    s3_client.copy_object(
        Bucket='my-bucket',
        CopySource='my-bucket/old-name.txt',
        Key='new-name.txt',
    )

2. Decorators

from tigris_boto3_ext import snapshot_enabled, with_snapshot, forked_from, with_rename

@snapshot_enabled
def create_snapshot_enabled_bucket(s3_client, bucket_name):
    return s3_client.create_bucket(Bucket=bucket_name)

# List available snapshots
@with_snapshot('my-bucket')
def list_snapshots(s3_client):
    return s3_client.list_buckets()

# Read from specific snapshot
@with_snapshot('my-bucket', snapshot_version='12345')
def read_from_snapshot(s3_client, key):
    return s3_client.get_object(Bucket='my-bucket', Key=key)

@forked_from('source-bucket', snapshot_version='12345')
def create_my_fork(s3_client, new_bucket):
    return s3_client.create_bucket(Bucket=new_bucket)

@with_rename
def rename_file(s3_client, bucket, old_key, new_key):
    return s3_client.copy_object(
        Bucket=bucket,
        CopySource=f'{bucket}/{old_key}',
        Key=new_key,
    )

# Use the decorated functions
create_snapshot_enabled_bucket(s3_client, 'my-bucket')
snapshots = list_snapshots(s3_client)
obj = read_from_snapshot(s3_client, 'file.txt')
create_my_fork(s3_client, 'my-fork')
rename_file(s3_client, 'my-bucket', 'old.txt', 'new.txt')

3. Helper Functions

from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    list_snapshots,
    create_fork,
    get_object_from_snapshot,
    get_snapshot_version,
    list_objects_from_snapshot,
    head_object_from_snapshot,
    has_snapshot_enabled,
    get_bucket_info,
    rename_object,
)

# Create snapshot-enabled bucket
create_snapshot_bucket(s3_client, 'my-bucket')

# Check if bucket has snapshots enabled
if has_snapshot_enabled(s3_client, 'my-bucket'):
    print("Snapshots are enabled!")

# Get comprehensive bucket information
info = get_bucket_info(s3_client, 'my-bucket')
print(f"Snapshot enabled: {info['snapshot_enabled']}")

# Create snapshots
result = create_snapshot(s3_client, 'my-bucket', snapshot_name='backup-1')
version = get_snapshot_version(result)

# List snapshots
snapshots = list_snapshots(s3_client, 'my-bucket')

# Create forks
create_fork(s3_client, 'new-bucket', 'source-bucket', snapshot_version=version)

# Access snapshot data
obj = get_object_from_snapshot(s3_client, 'my-bucket', 'file.txt', version)
objects = list_objects_from_snapshot(s3_client, 'my-bucket', '12345', Prefix='data/')
metadata = head_object_from_snapshot(s3_client, 'my-bucket', 'file.txt', '12345')

# Rename an object in place (no data rewrite)
rename_object(s3_client, 'my-bucket', 'old-name.txt', 'new-name.txt')

Complete Examples

Example 1: Backup and Restore Workflow

import boto3
from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    list_snapshots,
    create_fork,
    get_snapshot_version,
)

s3 = boto3.client('s3')

# Create a snapshot-enabled bucket
create_snapshot_bucket(s3, 'production-data')

# Add some data
s3.put_object(Bucket='production-data', Key='important.txt', Body=b'critical data')

# Create a snapshot
snapshot_result = create_snapshot(s3, 'production-data', snapshot_name='daily-backup')
snapshot_version = get_snapshot_version(snapshot_result)

# List all snapshots
snapshots = list_snapshots(s3, 'production-data')
for bucket in snapshots.get('Buckets', []):
    print(f"Snapshot: {bucket['Name']}")

# Restore from snapshot by creating a fork
create_fork(s3, 'restored-data', 'production-data', snapshot_version=snapshot_version)

Example 2: Testing with Snapshot Isolation

import boto3
from tigris_boto3_ext import create_fork, create_snapshot, get_snapshot_version

s3 = boto3.client('s3')

# Create a snapshot of production data
snapshot_result = create_snapshot(s3, 'production-data', snapshot_name='test-snapshot')
snapshot_version = get_snapshot_version(snapshot_result)

# Fork for testing (isolated copy)
create_fork(s3, 'test-data', 'production-data', snapshot_version=snapshot_version)

# Run tests against test-db without affecting production
s3.put_object(Bucket='test-data', Key='test-data.txt', Body=b'test data')

# Clean up test bucket when done
s3.delete_bucket(Bucket='test-data')

Example 3: Time-Travel Queries

import boto3
from tigris_boto3_ext import get_object_from_snapshot, list_objects_from_snapshot

s3 = boto3.client('s3')

# Get object as it was at a specific snapshot
historical_obj = get_object_from_snapshot(
    s3,
    'my-bucket',
    'config.json',
    snapshot_version='12345'
)
old_config = historical_obj['Body'].read()

# List all objects in historical snapshot
historical_objects = list_objects_from_snapshot(
    s3,
    'my-bucket',
    snapshot_version='12345',
    Prefix='logs/2024/'
)

for obj in historical_objects.get('Contents', []):
    print(f"Historical object: {obj['Key']}")

Example 4: Retrieving Bucket Snapshot and Fork Information

import boto3
from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    create_fork,
    get_snapshot_version,
    has_snapshot_enabled,
    get_bucket_info,
)

s3 = boto3.client('s3')

# Check if a bucket has snapshots enabled
bucket_name = 'my-bucket'

create_snapshot_bucket(s3, bucket_name)

if has_snapshot_enabled(s3, bucket_name):
    print(f"✓ Snapshots are enabled for {bucket_name}")
else:
    print(f"✗ Snapshots are not enabled for {bucket_name}")

# Get comprehensive bucket information
info = get_bucket_info(s3, bucket_name)
print(f"Snapshot enabled: {info['snapshot_enabled']}")

# Example: Check fork lineage
source_bucket = 'production-data'
create_snapshot_bucket(s3, source_bucket)

# Create a snapshot
snapshot_result = create_snapshot(s3, source_bucket, snapshot_name='v1')
snapshot_version = get_snapshot_version(snapshot_result)

# Create a fork
forked_bucket = 'test-data'
create_fork(s3, forked_bucket, source_bucket, snapshot_version=snapshot_version)

# Inspect the fork
fork_info = get_bucket_info(s3, forked_bucket)
print(f"Forked from: {fork_info['fork_source_bucket']}")
print(f"Snapshot version: {fork_info['fork_source_snapshot']}")

Example 5: Bundle API — Fetch Multiple Objects in One Request

import tarfile
import boto3
from tigris_boto3_ext import bundle_objects, BundleError, BUNDLE_ON_ERROR_FAIL

s3 = boto3.client('s3')

# Fetch a batch of training images as a streaming tar archive
keys = [f"dataset/train/img_{i:05d}.jpg" for i in range(1000)]
response = bundle_objects(s3, 'my-dataset-bucket', keys)

with tarfile.open(fileobj=response, mode="r|") as tar:
    for member in tar:
        if member.name == "__bundle_errors.json":
            continue  # skip the error manifest
        f = tar.extractfile(member)
        if f is not None:
            image_bytes = f.read()
            # feed to training pipeline

# Use fail mode for inference where every object must be present
try:
    response = bundle_objects(
        s3, 'my-bucket', keys, on_error=BUNDLE_ON_ERROR_FAIL
    )
except BundleError as e:
    print(f"Bundle failed (HTTP {e.status_code}): {e.body}")

See examples/bundle_usage.py for more patterns including error handling, response metadata, and ML training batches.

How It Works

This library uses boto3's event system to inject Tigris-specific headers into S3 API requests:

Request Headers (Sent to Tigris)

  • X-Tigris-Enable-Snapshot: true - Enables snapshot support for bucket creation
  • X-Tigris-Snapshot: true; name=<name> - Creates a snapshot
  • X-Tigris-Snapshot: <bucket_name> - Lists snapshots for a bucket
  • X-Tigris-Snapshot-Version: <version> - Reads from specific snapshot version
  • X-Tigris-Fork-Source-Bucket: <bucket> - Specifies fork source
  • X-Tigris-Fork-Source-Bucket-Snapshot: <version> - Forks from specific snapshot
  • X-Tigris-Rename: true - Turns a CopyObject request into an in-place rename

Response Headers (Returned by Tigris)

The following custom headers are returned in HeadBucket responses and can be accessed via get_bucket_info() and has_snapshot_enabled():

  • X-Tigris-Enable-Snapshot: true - Present when snapshots are enabled for the bucket
  • X-Tigris-Fork-Source-Bucket: <bucket_name> - Present on forked buckets, indicates the parent bucket
  • X-Tigris-Fork-Source-Bucket-Snapshot: <version> - Present on forked buckets, indicates the snapshot version

The library registers event handlers on before-sign.s3.* events to add request headers transparently.

Requirements

  • Python 3.9+
  • boto3 >= 1.26.0

Development

Setup

# Clone the repository
git clone https://github.com/tigrisdata/tigris-boto3-ext.git
cd tigris-boto3-ext

# Install with dev dependencies using uv
uv sync --all-extras

# Or with pip
pip install -e ".[dev]"

Running Tests

Integration Tests

Integration tests run against a real Tigris S3 service. See tests/integration/README.md for detailed setup instructions.

# Set up environment variables
export AWS_ENDPOINT_URL_S3="https://t3.storage.dev"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

# Run integration tests
uv run pytest tests/integration/ -v

Code Quality

# Type checking
uv run mypy tigris_boto3_ext

# Linting
uv run ruff check tigris_boto3_ext

# Auto-fix linting issues
uv run ruff check --fix tigris_boto3_ext

# Code formatting
uv run ruff format tigris_boto3_ext

# Check formatting without making changes
uv run ruff format --check tigris_boto3_ext

License

Apache-2.0

Contributing

Contributions welcome! Please open an issue or PR on GitHub.

Support

For issues and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tigris_boto3_ext-0.3.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tigris_boto3_ext-0.3.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file tigris_boto3_ext-0.3.0.tar.gz.

File metadata

  • Download URL: tigris_boto3_ext-0.3.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tigris_boto3_ext-0.3.0.tar.gz
Algorithm Hash digest
SHA256 08c85be99fb00037c208fce2bf7bd6a7076aebe6edd3f887541272cc7f7be262
MD5 efda3177d29a130dbffbe2c5747a7bb6
BLAKE2b-256 bad39f92802c28f779448804a0ce4fde127c22d3e4f5e858702a513dfb9d598e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tigris_boto3_ext-0.3.0.tar.gz:

Publisher: release.yml on tigrisdata/tigris-boto3-ext

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tigris_boto3_ext-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tigris_boto3_ext-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79a1337304bbf05325f8f60ed0de526b988af0cdacf9cfaae787b0eb31b163f3
MD5 dbf08a762bbbbbb765cd3e022e2f9938
BLAKE2b-256 010f31a63c0301810e815dd112744b0f9e8da700ce6c1f81c3cb698a52b2dc41

See more details on using hashes here.

Provenance

The following attestation bundles were made for tigris_boto3_ext-0.3.0-py3-none-any.whl:

Publisher: release.yml on tigrisdata/tigris-boto3-ext

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page