Skip to main content

Minimal AWS boto3 wrapper

Project description

better-aws

Python PyPI

A minimal, production-minded wrapper around boto3 focused on S3 and tabular data (CSV/Parquet/Excel).

  • S3-first: the handful of operations you use 90% of the time
  • Batch-Native and Glob-ready : same methods for single keys, lists, or glob patterns (*, **)
  • Ergonomic I/O: load() → Python objects, download() → local files, transfer() → move trees between local and S3
  • Logging-friendly: standalone "print-like" logs or plug into your app logger
  • Auth-ready: designed to support multiple auth modes (profile, custom files, static creds, .env)

Install

pip install better-aws

For object serialization support (pickle/joblib/skops):

pip install better-aws[objects]

Development (uv)

git clone https://github.com/thibault-charbonnier/better-aws.git
cd better-aws
uv sync

Quickstart

from better_aws import AWS

# 1) Create a session (boto3 will use the default credential chain unless you add other auth modes)
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)

# Optional sanity check
aws.identity(print_info=True)

# 2) Configure S3 defaults
aws.s3.config(
    bucket="my-bucket",
    key_prefix="my-project",   # optional: all keys are relative to this prefix
    output_type="pandas",      # tabular loads -> pandas (or "polars")
    file_type="parquet",       # default tabular format for dataframe uploads without extension
    overwrite=True,
)

# 3) List / load / upload
keys = aws.s3.list(prefix="raw/", limit=10)

df = aws.s3.load("raw/prices.parquet")     # -> pandas DataFrame (by config)
df["ret"] = df["close"].pct_change()

aws.s3.upload(df, "processed/prices_with_returns")  # -> parquet by default (by config)

# 4) Verify existence
print(aws.s3.exists("processed/prices_with_returns.parquet"))

Core features

1) Authentication

better-aws is built to keep auth clean and modular:

  • AWS profile / default chain (AWS CLI-style)
  • static credentials (Python args)
  • custom credentials_file / optional config_file
  • .env (dotenv)
# Static credentials
aws = AWS("s3admin", aws_access_key_id=AWS_ID_KEY, aws_secret_access_key=AWS_SECRET_KEY)

# .env config
aws = AWS("s3admin", env_file="test.env")

# Custom location for credentials files
aws = AWS("s3admin", credentials_file=r"\...\credentials")

# Classic CLI-like auth (boto3 fallback)
aws = AWS("s3admin")

Authentication priority

When creating a session, better-aws resolves credentials in this order — first match wins:

  • Static credentialsaws_access_key_id + aws_secret_access_key parameters passed directly to AWS()

  • Env file — a .env file passed via env_file=. Must contain AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Optionally AWS_SESSION_TOKEN and AWS_REGION / AWS_DEFAULT_REGION.

  • Custom credential filescredentials_file and/or config_file pointing to non-default AWS credential file locations

  • boto3 default chain — falls back to the native boto3 credential resolution. The most common case is the credentials file generated by aws configure (~/.aws/credentials on Linux/macOS, %USERPROFILE%\.aws\credentials on Windows). See the full boto3 credential chain for other sources (env vars, IAM roles, etc.). We

    For regular use, we recommend installing the AWS CLI and running aws configure once — better-aws will then pick up your credentials automatically with no extra configuration.


2) Configure your S3 "workspace"

Call aws.s3.config() once to set defaults for all subsequent operations. The main arguments:

  • bucket: default bucket
  • key_prefix: optional "root folder" — all keys are resolved relative to it
  • output_type: tabular load() output ("pandas" / "polars")
  • file_type: default format for DataFrame uploads without extension ("parquet" / "csv" / "xlsx")
  • overwrite: default overwrite policy
aws.s3.config(bucket="my-bucket", key_prefix="research", output_type="polars", file_type="parquet", overwrite=False)

3) Read from S3

Two ways to read from S3:

  • download() = S3 → local files (returns Path or List[Path])
  • load() = S3 → Python objects (JSON → dict, tabular → DataFrame)
path = aws.s3.download("reports/report.pdf", to="downloads/")

cfg = aws.s3.load("configs/pipeline.json")              # -> dict
df  = aws.s3.load("raw/prices.csv")                    # -> pandas/polars (by config)
dfs = aws.s3.load(["raw/a.parquet", "raw/b.parquet"])  # -> List[DataFrame]

Batch native: load() and download() accept a single key or a list of keys.

Both methods support glob patterns including recursive **:

# All CSVs directly under raw/
aws.s3.download("raw/*.csv", to="downloads/")

# All parquets recursively
dfs = aws.s3.load("data/**/*.parquet")

# Preserve the full S3 path structure locally (default: preserve relative to the glob root)
aws.s3.download("data/2023/*.csv", to="downloads/", preserve_prefix=True)
# -> downloads/data/2023/file.csv  (instead of downloads/file.csv)

4) Write to S3

upload() supports:

  • local file path or glob pattern → copied as-is, structure preserved
  • dict → JSON
  • bytes → raw payload
  • pandas/polars DataFrame → CSV/Parquet/Excel (based on key extension or default file_type)
aws.s3.upload("local/report.pdf", "reports/report.pdf")
aws.s3.upload({"run_id": 1}, "configs/run")                          # -> configs/run.json
aws.s3.upload(df, "processed/table")                                 # -> processed/table.parquet
aws.s3.upload([df, df], ["processed/a.parquet", "processed/b.parquet"])

# Glob upload: preserve local structure under a single S3 prefix
aws.s3.upload("exports/*.csv", "s3-prefix/exports/")

upload() returns the final S3 key(s) after upload.

Batch native: upload() accepts a single or list of src / key pairs.


5) Transfer trees

transfer() moves or copies entire file trees between local filesystems and S3, or between two S3 locations. It auto-infers the direction from the source and destination.

# Local -> S3 (move by default, deletes local files after upload)
aws.s3.transfer("exports/", "s3://my-bucket/archives/exports/")

# S3 -> local (move: deletes the S3 objects after download)
aws.s3.transfer("raw/2023/", "local/backup/2023/", move=True)

# S3 -> S3 (copy within or across buckets)
aws.s3.transfer("s3://bucket-a/data/", "s3://bucket-b/data/", move=False)

# Glob patterns are supported
aws.s3.transfer("raw/**/*.parquet", "archive/parquet/")

# Use explicit buckets when needed
aws.s3.transfer("data/", "archive/", bucket_src="prod-bucket", bucket_dst="archive-bucket")

transfer() preserves relative directory structure at the destination. Pass move=False to copy instead of move.


6) Utilities

# Check existence
aws.s3.exists("raw/prices.parquet")                    # -> bool

# List objects — returns List[dict] with key, size, last_modified, etag, storage_class
aws.s3.list(prefix="raw/", with_meta=True)

# List keys only
keys = aws.s3.list(prefix="raw/", with_meta=False)    # -> List[str]

# Delete (glob patterns supported, force=True required for patterns)
aws.s3.delete(["tmp/a.parquet", "tmp/b.parquet"])
aws.s3.delete("tmp/**", force=True)

# Pretty-print S3 prefix as a tree (sorted by size)
aws.s3.tree(prefix="data/", max_depth=3, folders_first=True)

7) Object serialization

better-aws can serialize arbitrary Python objects (e.g. scikit-learn models) directly to/from S3 using pickle, joblib, or skops.

Security: Requires allow_unsafe_serialization=True in config(). Deserializing untrusted data is unsafe by design.

aws.s3.config(
    bucket="my-bucket",
    allow_unsafe_serialization=True,
    object_base_format="joblib",    # "pickle" | "joblib" | "skops"
    joblib_compress=3,
)

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier().fit(X_train, y_train)

aws.s3.upload(model, "models/rf_classifier")      # -> models/rf_classifier.joblib
model = aws.s3.load("models/rf_classifier.joblib")

Supported extensions: .pkl / .pickle, .joblib / .jl, .skops.


8) Logging

  • verbose=Falseno package logs
  • verbose=True → a few info messages (minimal, no spam)
  • Pass your own logger to unify output with your app (e.g., Rich handler)
import logging
from rich.logging import RichHandler
from better_aws import AWS

logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.handlers = [RichHandler(rich_tracebacks=True)]
logger.propagate = False

# Custom logger
aws = AWS(profile="s3admin", region="eu-west-3", logger=logger, verbose=True)

# No logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=False)

# Minimal "print-like" logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)

API reference

AWS

AWS(
    profile=None,               # AWS profile name
    region=None,                # AWS region
    logger=None,                # Optional logging.Logger
    verbose=False,              # Enable info-level logs
    retries=3,                  # Max retry attempts (botocore standard mode)
    connect_timeout_s=10,       # Connection timeout in seconds
    read_timeout_s=300,         # Read timeout in seconds
    *,
    credentials_file=None,      # Path to a custom credentials file
    config_file=None,           # Path to a custom config file
    env_file=None,              # Path to a .env file with AWS credentials
    aws_access_key_id=None,     # Static access key ID
    aws_secret_access_key=None, # Static secret access key
    aws_session_token=None,     # Optional session token
)
Method Returns Description
aws.s3 S3 S3 service wrapper (lazy-loaded)
aws.identity(print_info=False) dict Get caller identity via STS (Arn, Account, UserId). Optionally logs it.
aws.info(msg, *args) None Log a message if verbose=True
aws.reset_session() None Clear the cached boto3 session (forces re-auth on next call)

S3

config()

Sets defaults for all subsequent S3 operations. Must be called before using any S3 method that requires a bucket.

aws.s3.config(
    bucket=None,                        # Default S3 bucket
    *,
    key_prefix="",                      # Prefix prepended to all keys
    output_type="pandas",               # Tabular load output: "pandas" | "polars"
    file_type="parquet",                # Default upload format: "csv" | "parquet" | "xlsx" | "xls" | serialization formats
    overwrite=True,                     # Allow overwriting existing objects
    encoding="utf-8",                   # Encoding for text-based I/O (JSON, CSV)
    csv_sep=",",                        # CSV column separator
    csv_index=False,                    # Include pandas index in CSV uploads
    parquet_index=None,                 # Include pandas index in parquet uploads (None = pandas default)
    excel_index=False,                  # Include pandas index in Excel uploads
    allow_unsafe_serialization=False,   # Enable pickle/joblib/skops serialization
    object_base_format="pickle",        # Default format for Python objects: "pickle" | "joblib" | "skops"
    pickle_protocol=pickle.HIGHEST_PROTOCOL,  # Pickle protocol version
    joblib_compress=3,                  # Joblib compression level (0–9)
    small_payload_threshold=5242880,    # Max in-memory payload size (bytes) before switching to temp-file upload
    multipart_threshold_mb=5,           # File size threshold to trigger multipart upload/download
    multipart_chunksize_mb=5,           # Chunk size for multipart transfers
    max_concurrency=8,                  # Max parallel threads for managed transfers
    use_threads=True,                   # Enable threading for managed transfers
    delete_batch_size=1000,             # Max objects per delete_objects call (S3 hard limit: 1000)
)

list()

aws.s3.list(
    prefix="",          # Filter keys by prefix
    *,
    bucket=None,        # Override default bucket
    limit=None,         # Max number of objects to return
    recursive=True,     # If False, list only direct children (non-recursive)
    with_meta=True,     # Include metadata in results
) -> List[dict] | List[str]

Returns a list of dicts when with_meta=True (fields: key, size, last_modified, etag, storage_class), or a list of key strings when with_meta=False.


exists()

aws.s3.exists(
    key,            # S3 object key
    *,
    bucket=None,    # Override default bucket
) -> bool

Returns True if the object exists, False otherwise.


load()

aws.s3.load(
    key,                    # str, List[str], or glob pattern
    *,
    bucket=None,            # Override default bucket
    output_type=None,       # Override default output type: "pandas" | "polars"
) -> Any | List[Any]

Loads one or more S3 objects into Python objects. Format is inferred from the key extension:

Extension Output
.json dict
.csv, .parquet, .xlsx, .xls DataFrame (pandas or polars per output_type)
.pkl, .pickle, .joblib, .jl, .skops Python object (requires allow_unsafe_serialization=True)
anything else bytes

Supports glob patterns (*, ?, **). Returns a single object for a single key, a list otherwise.


download()

aws.s3.download(
    key,                    # str, List[str], or glob pattern
    to=None,                # Local destination path or directory (default: current directory)
    *,
    preserve_prefix=False,  # If True, recreate the full S3 path locally.
                            # If False, preserve structure relative to the glob root.
    bucket=None,            # Override default bucket
) -> Path | List[Path]

Downloads one or more S3 objects to the local filesystem. Supports glob patterns including ** for recursive matching. Parent directories are created automatically.


upload()

aws.s3.upload(
    src,            # UploadInput or List[UploadInput]
                    # Supported types: str/Path (file or glob), dict, bytes, pd.DataFrame, pl.DataFrame
    key,            # str or List[str] — destination S3 key(s) or single prefix for glob sources
    *,
    bucket=None,    # Override default bucket
    overwrite=None, # Override default overwrite setting
) -> str | List[str]

Uploads one or more objects to S3. The serialization format is inferred from the key extension, or falls back to file_type from config. Returns the final S3 key(s).


delete()

aws.s3.delete(
    key,            # str, List[str], or glob pattern
    *,
    force=False,    # Required when using glob patterns
    bucket=None,    # Override default bucket
) -> None

Deletes one or more S3 objects. Glob patterns (including **) are supported but require force=True as a safety guard. Deletions are batched in groups of up to delete_batch_size (default: 1000).


transfer()

aws.s3.transfer(
    src,                # str — source path, S3 key, glob pattern, or s3:// URI
    dst,                # str — destination path, S3 prefix, or s3:// URI
    *,
    move=True,          # If True, delete source after successful transfer
    bucket_src=None,    # Override source bucket for S3 sources
    bucket_dst=None,    # Override destination bucket for S3 destinations
) -> str | List[str] | Path | List[Path]

Transfers file trees between local filesystems and S3, or between two S3 locations. The transfer direction is inferred automatically:

Source Destination Mode
local path / glob S3 key or URI local → S3
S3 key / URI local path S3 → local
S3 key / URI S3 key or URI S3 → S3

Relative directory structure is always preserved at the destination. Returns the destination path(s) created.


tree()

aws.s3.tree(
    prefix="",              # S3 prefix to display
    *,
    bucket=None,            # Override default bucket
    show_full_path=True,    # Show full S3 key vs. basename only
    max_depth=None,         # Max depth to display (None = unlimited)
    max_children=None,      # Max children per node (None = unlimited)
    folders_first=True,     # Display folders before files at each level
    limit=None,             # Max number of S3 objects to include
) -> None

Pretty-prints the S3 object tree under a given prefix using rich, sorted by total size at each level.


License

MIT License

Copyright (c) 2026 better-aws Contributors

See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

better_aws-1.0.0.tar.gz (73.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

better_aws-1.0.0-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file better_aws-1.0.0.tar.gz.

File metadata

  • Download URL: better_aws-1.0.0.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for better_aws-1.0.0.tar.gz
Algorithm Hash digest
SHA256 455991f0643ec48cefb2f79cd57f38ba2de35c87cdeddad980a436c1da1c18f0
MD5 6623c5dd66bda19002a5fccf6715346f
BLAKE2b-256 5d5faaf9ee1554959ebe5aba5dd300d2c684de7d2830072d1bbed8cb0e8d5feb

See more details on using hashes here.

File details

Details for the file better_aws-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: better_aws-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for better_aws-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e14a0be91885e7a1d6743fa5da3f0d4de7b589eb2595912106964a2651a61e48
MD5 c70e9fe197521aad31b422a127c5a888
BLAKE2b-256 7253bb2e9cdd6840e8fe114ad27bc71b737fe59973908a315792076384e47383

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page