Skip to main content

The Pythonic Bridge Between S3 and the Local Filesystem. Use S3 objects like local files with automatic sync.

Project description

Language: 한국어 | English

Use S3 objects like local files. A Pythonic, automatic local sync layer for S3

Python Version License: MIT Status Tests


What is s3lync?

s3lync is a Python package that lets you work with S3 objects as if they were local files.

It automatically handles:

  • 📥 Download on read
  • 📤 Upload on write
  • 🔍 Change detection via hashes
  • 💾 Local caching
  • 🔁 Optional force synchronization

All behind a clean, Pythonic API.


Why s3lync?

Most S3 libraries focus on object operations. s3lync focuses on developer experience.

  • You open a file → it syncs
  • You write to a file → it uploads
  • You don't think about S3 until you need to

Features

  • 🚀 Pythonic API — Work with S3 like local files
  • 🔄 Automatic Sync — Download & upload with change detection
  • Hash Verification — MD5-based integrity checks
  • 💾 Smart Caching — Local cache with intelligent invalidation
  • 🔒 Force Sync Mode — Make local and remote identical
  • Parallel Transfers — Up to 8x faster directory sync
  • 🔁 Auto Retry — Exponential backoff for transient failures
  • 📝 Structured Logging — Configurable logging system

Installation

pip install s3lync

Async Support (Optional)

For async I/O operations, install aioboto3:

pip install s3lync[async]
# or
pip install aioboto3

Quick Start

Basic Usage (Sync)

from s3lync import S3Object

# Create S3 object reference
obj = S3Object("s3://my-bucket/path/to/file.txt")

# Download from S3
obj.download()

# Upload to S3
obj.upload()

Async Usage

from s3lync import AsyncS3Object
import asyncio

async def main():
    # Create S3 object reference
    obj = AsyncS3Object("s3://my-bucket/path/to/file.txt")
    
    # Download from S3 asynchronously
    await obj.download()
    
    # Upload to S3 asynchronously
    await obj.upload()

asyncio.run(main())

With boto3 Client (Recommended)

Sync version:

from s3lync import S3Object
import boto3

# Create boto3 session and client
session = boto3.Session(profile_name="dev")
s3_client = session.client("s3")

# Create S3Object with client
obj = S3Object(
    "s3://bucket/key",
    local_path="./local",
    boto3_client=s3_client,
)

obj.upload()

Async version:

from s3lync import AsyncS3Object
import aioboto3
import asyncio

async def main():
    # Create aioboto3 session
    session = aioboto3.Session()
    
    # Create AsyncS3Object with session
    obj = AsyncS3Object(
        "s3://bucket/key",
        local_path="./local",
        aioboto3_session=session,
    )
    
    await obj.upload()

asyncio.run(main())

S3 URI Formats

s3lync supports multiple URI styles:

s3://bucket/key
s3://endpoint@bucket/key
s3://secret_key:access_key@endpoint/bucket/key
s3://secret_key:access_key@https://endpoint/bucket/key

Examples:

# Basic URI (credentials from environment variables)
S3Object("s3://my-bucket/data.json")

# Custom S3-compatible endpoint
S3Object("s3://minio.example.com@my-bucket/data.json")

# With credentials and HTTPS endpoint
S3Object("s3://mysecret:mykey@https://minio.example.com/my-bucket/data.json")

How It Works

Smart Synchronization

  • Local file hash ↔ S3 ETag comparison
  • Multipart uploads automatically skip hash checks
  • mirror=True makes remote/local identical (also deletes extra files)

Local Cache

  • Default: ~/.cache/s3lync
  • Configurable via XDG_CACHE_HOME
  • Or explicitly via local_path

Common Operations

Working with S3 Objects Like Files

Method 1: Context manager with automatic sync (Recommended!)

Sync:

# Auto-downloads on read, auto-uploads on write
obj = S3Object("s3://bucket/token.json")
with obj.open("w") as f:
    json.dump({"access_token": "abc123"}, f)

with obj.open("r") as f:
    token = json.load(f)

Async:

import asyncio
from s3lync import AsyncS3Object

async def main():
    obj = AsyncS3Object("s3://bucket/token.json")
    
    # Auto-uploads on write
    async with obj.open("w") as f:
        f.write('{"access_token": "abc123"}')
    
    # Auto-downloads on read
    async with obj.open("r") as f:
        data = f.read()

asyncio.run(main())

Method 2: Standard Python open() (pathlib-compatible)

# S3Object implements __fspath__() protocol
obj.download()  # Manual sync
with open(obj, "r") as f:  # Works like a path!
    data = json.load(f)
obj.upload()  # Manual sync

Method 3: Direct local_path access

# Direct file path manipulation
obj.download()
with open(obj.local_path, "r") as f:
    data = f.read()
obj.upload()

Basic Download / Upload

# Basic download
obj.download()

# Force sync: make remote identical to local (delete extra remote files if needed)
obj.upload(mirror=True)

Directory Synchronization

s3lync supports recursive directory download and upload with smart change detection.

Sync version:

# Download entire directory
obj = S3Object("s3://bucket/path/to/dir")
obj.download()

# Upload entire directory (excludes hidden files by default)
obj.upload()

# Mirror mode: delete files not present in source
obj.download(mirror=True)  # Deletes local files not in S3
obj.upload(mirror=True)    # Deletes remote files not in local

Async version (faster with parallel processing):

import asyncio
from s3lync import AsyncS3Object

async def main():
    obj = AsyncS3Object("s3://bucket/path/to/dir")
    
    # Download entire directory asynchronously
    await obj.download()
    
    # Upload entire directory asynchronously
    await obj.upload()
    
    # Mirror mode
    await obj.download(mirror=True)
    await obj.upload(mirror=True)

asyncio.run(main())

Sync multiple directories in parallel:

import asyncio
from s3lync import AsyncS3Object

async def sync_multiple():
    # Download multiple directories concurrently
    tasks = [
        AsyncS3Object("s3://bucket/dir1").download(),
        AsyncS3Object("s3://bucket/dir2").download(),
        AsyncS3Object("s3://bucket/dir3").download(),
    ]
    await asyncio.gather(*tasks)

asyncio.run(sync_multiple())

Exclude Patterns

Control which files to include/exclude during sync operations using regex patterns.

Default Exclusions

  • /.*/ — Hidden files and directories (.git, .venv, etc)
  • __pycache__ — Python cache directories
  • .egg-info — Python package metadata

How Excludes Work

Object creation — replaces all defaults:

obj = S3Object(
    "s3://bucket/path",
    excludes=[r".*\.tmp$", r"\.git/.*"]
)
obj.upload()  # Uses ONLY: [.*\.tmp$, \.git/.*]

Method call — adds to defaults:

obj = S3Object("s3://bucket/path")
obj.upload(excludes=[r".*\.tmp$"])
# Uses: [/.*/,  __pycache__, .egg-info, .*\.tmp$]

obj.download(excludes=[r"node_modules/.*"])
# Uses: [/.*/,  __pycache__, .egg-info, node_modules/.*]

AWS Credentials

s3lync uses boto3's standard credential provider chain.

Profile Selection

boto3 supports 3 ways to choose AWS profile. In production, explicit selection or environment variables are most common:

✅ 1. Session with profile (Recommended)

import boto3

session = boto3.Session(profile_name="dev")
s3_client = session.client("s3")

obj = S3Object("s3://bucket/key", boto3_client=s3_client)

Advantages:

  • Explicit in code
  • Works for multi-account scenarios
  • Most flexible

✅ 2. Environment Variable

export AWS_PROFILE=dev
import boto3

session = boto3.Session()  # Auto-uses AWS_PROFILE
s3_client = session.client("s3")

Advantages:

  • Environment-specific configuration
  • CI/CD friendly
  • No code changes

⚠️ 3. Default Profile (Implicit)

import boto3

session = boto3.Session()  # Uses [default] profile
s3_client = session.client("s3")

Credentials Search Order

  1. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  2. AWS credentials file: ~/.aws/credentials (respects AWS_PROFILE)
  3. AWS config file: ~/.aws/config
  4. IAM Role (EC2, EKS, ECS environments)

Quick Examples

# Using environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=ap-northeast-2

# Or using a profile
export AWS_PROFILE=my-profile

Additional Features

Logging Configuration

Configure structured logging for debugging and monitoring:

from s3lync import configure_logging, get_logger
import logging

# Enable debug logging
configure_logging(level=logging.DEBUG)

# Or get a logger for custom use
logger = get_logger("my_app")
logger.info("Starting sync operation")

# Disable logging output
configure_logging(level=logging.CRITICAL)

Automatic Retry

s3lync automatically retries on transient AWS errors with exponential backoff:

  • ThrottlingException
  • ServiceUnavailable
  • SlowDown
  • RequestTimeout
  • Connection errors

Default: 3 attempts with 0.5s base delay (max 30s).

You can also use retry decorators in your own code:

from s3lync import retry, async_retry, RetryConfig

# Sync function with retry
@retry(max_attempts=5, base_delay=1.0)
def my_operation():
    # Your code here
    pass

# Async function with retry
@async_retry(max_attempts=3)
async def my_async_operation():
    # Your async code here
    pass

Custom Callbacks

Chain custom callbacks with progress tracking:

from s3lync import S3Object, chain_callbacks

def my_callback(bytes_transferred: int):
    print(f"Transferred: {bytes_transferred} bytes")

obj = S3Object("s3://bucket/large-file.bin", local_path="/tmp/file.bin")

# Use custom callback during download
metadata = obj._client.download_file(
    bucket="bucket",
    key="large-file.bin",
    local_path="/tmp/file.bin",
    callback=my_callback,
    show_progress=True
)

Progress Display Control

Control progress bar display mode:

from s3lync import S3Object
import boto3

# Option 1: Set default progress mode when creating object
obj = S3Object(
    "s3://bucket/key",
    local_path="./local",
    progress_mode="compact"  # "progress" (default), "compact", or "disabled"
)
obj.upload()

# Option 2: Override for specific operation
obj.download(progress_mode="disabled")

# Option 3: With boto3 client
session = boto3.Session(profile_name="dev")
s3_client = session.client("s3")
obj = S3Object(
    "s3://bucket/key",
    boto3_client=s3_client,
    progress_mode="compact"
)

Progress Mode Options:

  • "progress" (default): Live tqdm progress bar with real-time updates
  • "compact": Summary output only on completion (non-interactive, great for CI/CD)
  • "disabled": No progress display

Note: In non-TTY environments (e.g., PyCharm console), progress bar rendering is auto-adjusted for compatibility.


License

MIT License — see LICENSE


Author

JunSeok Kim Built with ❤️ to make S3 feel local

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3lync-0.4.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3lync-0.4.0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file s3lync-0.4.0.tar.gz.

File metadata

  • Download URL: s3lync-0.4.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for s3lync-0.4.0.tar.gz
Algorithm Hash digest
SHA256 72deb46f6d3bc1e1b83865d077475dee9f0a479972b6903c9ad4275a9064decf
MD5 d8ff18faa9054baaca76c32a431dfc5c
BLAKE2b-256 0f73c8b202c9bae55abca765ce3b302ca805030aaed2193b9c2939e8614ba7b0

See more details on using hashes here.

File details

Details for the file s3lync-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: s3lync-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for s3lync-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55ba56206a3f33b2fe5f5b51e4a6cabcec672cbd31c46eb185655f0ca46ff7dd
MD5 78ccddcd16f4cc46e936a22289560127
BLAKE2b-256 ce025f8f19a9851fd6e516e6621920e518c96297cbcf105e15c84a3036f16ef3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page