Minimal AWS boto3 wrapper

Project description

better-aws

A minimal, production-minded wrapper around boto3 focused on S3 and tabular data (CSV/Parquet/Excel).

S3-first: the handful of operations you use 90% of the time
Batch-Native and Glob-ready : same methods for single keys, lists, or glob patterns (*, **)
Ergonomic I/O: load() → Python objects, download() → local files, transfer() → move trees between local and S3
Logging-friendly: standalone "print-like" logs or plug into your app logger
Auth-ready: designed to support multiple auth modes (profile, custom files, static creds, .env)

Install

pip install better-aws

For object serialization support (pickle/joblib/skops):

pip install better-aws[objects]

Development (uv)

git clone https://github.com/thibault-charbonnier/better-aws.git
cd better-aws
uv sync

Quickstart

from better_aws import AWS

# 1) Create a session (boto3 will use the default credential chain unless you add other auth modes)
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)

# Optional sanity check
aws.identity(print_info=True)

# 2) Configure S3 defaults
aws.s3.config(
    bucket="my-bucket",
    key_prefix="my-project",   # optional: all keys are relative to this prefix
    output_type="pandas",      # tabular loads -> pandas (or "polars")
    file_type="parquet",       # default tabular format for dataframe uploads without extension
    overwrite=True,
)

# 3) List / load / upload
keys = aws.s3.list(prefix="raw/", limit=10)

df = aws.s3.load("raw/prices.parquet")     # -> pandas DataFrame (by config)
df["ret"] = df["close"].pct_change()

aws.s3.upload(df, "processed/prices_with_returns")  # -> parquet by default (by config)

# 4) Verify existence
print(aws.s3.exists("processed/prices_with_returns.parquet"))

Core features

1) Authentication

better-aws is built to keep auth clean and modular:

AWS profile / default chain (AWS CLI-style)
static credentials (Python args)
custom credentials_file / optional config_file
.env (dotenv)

# Static credentials
aws = AWS("s3admin", aws_access_key_id=AWS_ID_KEY, aws_secret_access_key=AWS_SECRET_KEY)

# .env config
aws = AWS("s3admin", env_file="test.env")

# Custom location for credentials files
aws = AWS("s3admin", credentials_file=r"\...\credentials")

# Classic CLI-like auth (boto3 fallback)
aws = AWS("s3admin")

Authentication priority

When creating a session, better-aws resolves credentials in this order — first match wins:

Static credentials — aws_access_key_id + aws_secret_access_key parameters passed directly to AWS()
Env file — a .env file passed via env_file=. Must contain AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Optionally AWS_SESSION_TOKEN and AWS_REGION / AWS_DEFAULT_REGION.
Custom credential files — credentials_file and/or config_file pointing to non-default AWS credential file locations
boto3 default chain — falls back to the native boto3 credential resolution. The most common case is the credentials file generated by aws configure (~/.aws/credentials on Linux/macOS, %USERPROFILE%\.aws\credentials on Windows). See the full boto3 credential chain for other sources (env vars, IAM roles, etc.). We

For regular use, we recommend installing the AWS CLI and running aws configure once — better-aws will then pick up your credentials automatically with no extra configuration.

2) Configure your S3 "workspace"

Call aws.s3.config() once to set defaults for all subsequent operations. The main arguments:

bucket: default bucket
key_prefix: optional "root folder" — all keys are resolved relative to it
output_type: tabular load() output ("pandas" / "polars")
file_type: default format for DataFrame uploads without extension ("parquet" / "csv" / "xlsx")
overwrite: default overwrite policy

aws.s3.config(bucket="my-bucket", key_prefix="research", output_type="polars", file_type="parquet", overwrite=False)

3) Read from S3

Two ways to read from S3:

download() = S3 → local files (returns Path or List[Path])
load() = S3 → Python objects (JSON → dict, tabular → DataFrame)

path = aws.s3.download("reports/report.pdf", to="downloads/")

cfg = aws.s3.load("configs/pipeline.json")              # -> dict
df  = aws.s3.load("raw/prices.csv")                    # -> pandas/polars (by config)
dfs = aws.s3.load(["raw/a.parquet", "raw/b.parquet"])  # -> List[DataFrame]

Batch native: load() and download() accept a single key or a list of keys.

Both methods support glob patterns including recursive **:

# All CSVs directly under raw/
aws.s3.download("raw/*.csv", to="downloads/")

# All parquets recursively
dfs = aws.s3.load("data/**/*.parquet")

# Preserve the full S3 path structure locally (default: preserve relative to the glob root)
aws.s3.download("data/2023/*.csv", to="downloads/", preserve_prefix=True)
# -> downloads/data/2023/file.csv  (instead of downloads/file.csv)

4) Write to S3

upload() supports:

local file path or glob pattern → copied as-is, structure preserved
dict → JSON
bytes → raw payload
pandas/polars DataFrame → CSV/Parquet/Excel (based on key extension or default file_type)

aws.s3.upload("local/report.pdf", "reports/report.pdf")
aws.s3.upload({"run_id": 1}, "configs/run")                          # -> configs/run.json
aws.s3.upload(df, "processed/table")                                 # -> processed/table.parquet
aws.s3.upload([df, df], ["processed/a.parquet", "processed/b.parquet"])

# Glob upload: preserve local structure under a single S3 prefix
aws.s3.upload("exports/*.csv", "s3-prefix/exports/")

upload() returns the final S3 key(s) after upload.

Batch native: upload() accepts a single or list of src / key pairs.

5) Transfer trees

transfer() moves or copies entire file trees between local filesystems and S3, or between two S3 locations. It auto-infers the direction from the source and destination.

# Local -> S3 (move by default, deletes local files after upload)
aws.s3.transfer("exports/", "s3://my-bucket/archives/exports/")

# S3 -> local (move: deletes the S3 objects after download)
aws.s3.transfer("raw/2023/", "local/backup/2023/", move=True)

# S3 -> S3 (copy within or across buckets)
aws.s3.transfer("s3://bucket-a/data/", "s3://bucket-b/data/", move=False)

# Glob patterns are supported
aws.s3.transfer("raw/**/*.parquet", "archive/parquet/")

# Use explicit buckets when needed
aws.s3.transfer("data/", "archive/", bucket_src="prod-bucket", bucket_dst="archive-bucket")

transfer() preserves relative directory structure at the destination. Pass move=False to copy instead of move.

6) Utilities

# Check existence
aws.s3.exists("raw/prices.parquet")                    # -> bool

# List objects — returns List[dict] with key, size, last_modified, etag, storage_class
aws.s3.list(prefix="raw/", with_meta=True)

# List keys only
keys = aws.s3.list(prefix="raw/", with_meta=False)    # -> List[str]

# Delete (glob patterns supported, force=True required for patterns)
aws.s3.delete(["tmp/a.parquet", "tmp/b.parquet"])
aws.s3.delete("tmp/**", force=True)

# Pretty-print S3 prefix as a tree (sorted by size)
aws.s3.tree(prefix="data/", max_depth=3, folders_first=True)

7) Object serialization

better-aws can serialize arbitrary Python objects (e.g. scikit-learn models) directly to/from S3 using pickle, joblib, or skops.

Security: Requires allow_unsafe_serialization=True in config(). Deserializing untrusted data is unsafe by design.

aws.s3.config(
    bucket="my-bucket",
    allow_unsafe_serialization=True,
    object_base_format="joblib",    # "pickle" | "joblib" | "skops"
    joblib_compress=3,
)

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier().fit(X_train, y_train)

aws.s3.upload(model, "models/rf_classifier")      # -> models/rf_classifier.joblib
model = aws.s3.load("models/rf_classifier.joblib")

Supported extensions: .pkl / .pickle, .joblib / .jl, .skops.

8) Logging

verbose=False → no package logs
verbose=True → a few info messages (minimal, no spam)
Pass your own logger to unify output with your app (e.g., Rich handler)

import logging
from rich.logging import RichHandler
from better_aws import AWS

logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.handlers = [RichHandler(rich_tracebacks=True)]
logger.propagate = False

# Custom logger
aws = AWS(profile="s3admin", region="eu-west-3", logger=logger, verbose=True)

# No logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=False)

# Minimal "print-like" logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)

API reference

`AWS`

AWS(
    profile=None,               # AWS profile name
    region=None,                # AWS region
    logger=None,                # Optional logging.Logger
    verbose=False,              # Enable info-level logs
    retries=3,                  # Max retry attempts (botocore standard mode)
    connect_timeout_s=10,       # Connection timeout in seconds
    read_timeout_s=300,         # Read timeout in seconds
    *,
    credentials_file=None,      # Path to a custom credentials file
    config_file=None,           # Path to a custom config file
    env_file=None,              # Path to a .env file with AWS credentials
    aws_access_key_id=None,     # Static access key ID
    aws_secret_access_key=None, # Static secret access key
    aws_session_token=None,     # Optional session token
)

Method	Returns	Description
`aws.s3`	`S3`	S3 service wrapper (lazy-loaded)
`aws.identity(print_info=False)`	`dict`	Get caller identity via STS (`Arn`, `Account`, `UserId`). Optionally logs it.
`aws.info(msg, *args)`	`None`	Log a message if `verbose=True`
`aws.reset_session()`	`None`	Clear the cached boto3 session (forces re-auth on next call)

`S3`

`config()`

Sets defaults for all subsequent S3 operations. Must be called before using any S3 method that requires a bucket.

aws.s3.config(
    bucket=None,                        # Default S3 bucket
    *,
    key_prefix="",                      # Prefix prepended to all keys
    output_type="pandas",               # Tabular load output: "pandas" | "polars"
    file_type="parquet",                # Default upload format: "csv" | "parquet" | "xlsx" | "xls" | serialization formats
    overwrite=True,                     # Allow overwriting existing objects
    encoding="utf-8",                   # Encoding for text-based I/O (JSON, CSV)
    csv_sep=",",                        # CSV column separator
    csv_index=False,                    # Include pandas index in CSV uploads
    parquet_index=None,                 # Include pandas index in parquet uploads (None = pandas default)
    excel_index=False,                  # Include pandas index in Excel uploads
    allow_unsafe_serialization=False,   # Enable pickle/joblib/skops serialization
    object_base_format="pickle",        # Default format for Python objects: "pickle" | "joblib" | "skops"
    pickle_protocol=pickle.HIGHEST_PROTOCOL,  # Pickle protocol version
    joblib_compress=3,                  # Joblib compression level (0–9)
    small_payload_threshold=5242880,    # Max in-memory payload size (bytes) before switching to temp-file upload
    multipart_threshold_mb=5,           # File size threshold to trigger multipart upload/download
    multipart_chunksize_mb=5,           # Chunk size for multipart transfers
    max_concurrency=8,                  # Max parallel threads for managed transfers
    use_threads=True,                   # Enable threading for managed transfers
    delete_batch_size=1000,             # Max objects per delete_objects call (S3 hard limit: 1000)
)

`list()`

aws.s3.list(
    prefix="",          # Filter keys by prefix
    *,
    bucket=None,        # Override default bucket
    limit=None,         # Max number of objects to return
    recursive=True,     # If False, list only direct children (non-recursive)
    with_meta=True,     # Include metadata in results
) -> List[dict] | List[str]

Returns a list of dicts when with_meta=True (fields: key, size, last_modified, etag, storage_class), or a list of key strings when with_meta=False.

`exists()`

aws.s3.exists(
    key,            # S3 object key
    *,
    bucket=None,    # Override default bucket
) -> bool

Returns True if the object exists, False otherwise.

`load()`

aws.s3.load(
    key,                    # str, List[str], or glob pattern
    *,
    bucket=None,            # Override default bucket
    output_type=None,       # Override default output type: "pandas" | "polars"
) -> Any | List[Any]

Loads one or more S3 objects into Python objects. Format is inferred from the key extension:

Extension	Output
`.json`	`dict`
`.csv`, `.parquet`, `.xlsx`, `.xls`	DataFrame (pandas or polars per `output_type`)
`.pkl`, `.pickle`, `.joblib`, `.jl`, `.skops`	Python object (requires `allow_unsafe_serialization=True`)
anything else	`bytes`

Supports glob patterns (*, ?, **). Returns a single object for a single key, a list otherwise.

`download()`

aws.s3.download(
    key,                    # str, List[str], or glob pattern
    to=None,                # Local destination path or directory (default: current directory)
    *,
    preserve_prefix=False,  # If True, recreate the full S3 path locally.
                            # If False, preserve structure relative to the glob root.
    bucket=None,            # Override default bucket
) -> Path | List[Path]

Downloads one or more S3 objects to the local filesystem. Supports glob patterns including ** for recursive matching. Parent directories are created automatically.

`upload()`

aws.s3.upload(
    src,            # UploadInput or List[UploadInput]
                    # Supported types: str/Path (file or glob), dict, bytes, pd.DataFrame, pl.DataFrame
    key,            # str or List[str] — destination S3 key(s) or single prefix for glob sources
    *,
    bucket=None,    # Override default bucket
    overwrite=None, # Override default overwrite setting
) -> str | List[str]

Uploads one or more objects to S3. The serialization format is inferred from the key extension, or falls back to file_type from config. Returns the final S3 key(s).

`delete()`

aws.s3.delete(
    key,            # str, List[str], or glob pattern
    *,
    force=False,    # Required when using glob patterns
    bucket=None,    # Override default bucket
) -> None

Deletes one or more S3 objects. Glob patterns (including **) are supported but require force=True as a safety guard. Deletions are batched in groups of up to delete_batch_size (default: 1000).

`transfer()`

aws.s3.transfer(
    src,                # str — source path, S3 key, glob pattern, or s3:// URI
    dst,                # str — destination path, S3 prefix, or s3:// URI
    *,
    move=True,          # If True, delete source after successful transfer
    bucket_src=None,    # Override source bucket for S3 sources
    bucket_dst=None,    # Override destination bucket for S3 destinations
) -> str | List[str] | Path | List[Path]

Transfers file trees between local filesystems and S3, or between two S3 locations. The transfer direction is inferred automatically:

Source	Destination	Mode
local path / glob	S3 key or URI	`local → S3`
S3 key / URI	local path	`S3 → local`
S3 key / URI	S3 key or URI	`S3 → S3`

Relative directory structure is always preserved at the destination. Returns the destination path(s) created.

`tree()`

aws.s3.tree(
    prefix="",              # S3 prefix to display
    *,
    bucket=None,            # Override default bucket
    show_full_path=True,    # Show full S3 key vs. basename only
    max_depth=None,         # Max depth to display (None = unlimited)
    max_children=None,      # Max children per node (None = unlimited)
    folders_first=True,     # Display folders before files at each level
    limit=None,             # Max number of S3 objects to include
) -> None

Pretty-prints the S3 object tree under a given prefix using rich, sorted by total size at each level.

License

MIT License

See LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Mar 30, 2026

0.7.0

Mar 17, 2026

0.6.0

Feb 25, 2026

0.5.0

Feb 25, 2026

0.4.0

Feb 21, 2026

0.3.0

Feb 21, 2026

0.2.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

better_aws-1.0.0.tar.gz (73.0 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

better_aws-1.0.0-py3-none-any.whl (39.1 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file better_aws-1.0.0.tar.gz.

File metadata

Download URL: better_aws-1.0.0.tar.gz
Upload date: Mar 30, 2026
Size: 73.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for better_aws-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`455991f0643ec48cefb2f79cd57f38ba2de35c87cdeddad980a436c1da1c18f0`
MD5	`6623c5dd66bda19002a5fccf6715346f`
BLAKE2b-256	`5d5faaf9ee1554959ebe5aba5dd300d2c684de7d2830072d1bbed8cb0e8d5feb`

See more details on using hashes here.

File details

Details for the file better_aws-1.0.0-py3-none-any.whl.

File metadata

Download URL: better_aws-1.0.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for better_aws-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e14a0be91885e7a1d6743fa5da3f0d4de7b589eb2595912106964a2651a61e48`
MD5	`c70e9fe197521aad31b422a127c5a888`
BLAKE2b-256	`7253bb2e9cdd6840e8fe114ad27bc71b737fe59973908a315792076384e47383`

See more details on using hashes here.

better-aws 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

better-aws

Install

Development (uv)

Quickstart

Core features

1) Authentication

Authentication priority

2) Configure your S3 "workspace"

3) Read from S3

4) Write to S3

5) Transfer trees

6) Utilities

7) Object serialization

8) Logging

API reference

AWS

S3

config()

list()

exists()

load()

download()

upload()

delete()

transfer()

tree()

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`AWS`

`S3`

`config()`

`list()`

`exists()`

`load()`

`download()`

`upload()`

`delete()`

`transfer()`

`tree()`