Minimal AWS boto3 wrapper
Project description
better-aws
A minimal, production-minded wrapper around boto3 focused on S3 and tabular data (CSV/Parquet/Excel).
- S3-first: the handful of operations you use 90% of the time
- Batch-Native and Glob-ready : same methods for single keys, lists, or glob patterns (*, **)
- Ergonomic I/O:
load()→ Python objects,download()→ local files,transfer()→ move trees between local and S3 - Logging-friendly: standalone "print-like" logs or plug into your app logger
- Auth-ready: designed to support multiple auth modes (profile, custom files, static creds, .env)
Install
pip install better-aws
For object serialization support (pickle/joblib/skops):
pip install better-aws[objects]
Development (uv)
git clone https://github.com/thibault-charbonnier/better-aws.git
cd better-aws
uv sync
Quickstart
from better_aws import AWS
# 1) Create a session (boto3 will use the default credential chain unless you add other auth modes)
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)
# Optional sanity check
aws.identity(print_info=True)
# 2) Configure S3 defaults
aws.s3.config(
bucket="my-bucket",
key_prefix="my-project", # optional: all keys are relative to this prefix
output_type="pandas", # tabular loads -> pandas (or "polars")
file_type="parquet", # default tabular format for dataframe uploads without extension
overwrite=True,
)
# 3) List / load / upload
keys = aws.s3.list(prefix="raw/", limit=10)
df = aws.s3.load("raw/prices.parquet") # -> pandas DataFrame (by config)
df["ret"] = df["close"].pct_change()
aws.s3.upload(df, "processed/prices_with_returns") # -> parquet by default (by config)
# 4) Verify existence
print(aws.s3.exists("processed/prices_with_returns.parquet"))
Core features
1) Authentication
better-aws is built to keep auth clean and modular:
- AWS profile / default chain (AWS CLI-style)
- static credentials (Python args)
- custom
credentials_file/ optionalconfig_file .env(dotenv)
# Static credentials
aws = AWS("s3admin", aws_access_key_id=AWS_ID_KEY, aws_secret_access_key=AWS_SECRET_KEY)
# .env config
aws = AWS("s3admin", env_file="test.env")
# Custom location for credentials files
aws = AWS("s3admin", credentials_file=r"\...\credentials")
# Classic CLI-like auth (boto3 fallback)
aws = AWS("s3admin")
Authentication priority
When creating a session, better-aws resolves credentials in this order — first match wins:
-
Static credentials —
aws_access_key_id+aws_secret_access_keyparameters passed directly toAWS() -
Env file — a
.envfile passed viaenv_file=. Must containAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY. OptionallyAWS_SESSION_TOKENandAWS_REGION/AWS_DEFAULT_REGION. -
Custom credential files —
credentials_fileand/orconfig_filepointing to non-default AWS credential file locations -
boto3 default chain — falls back to the native boto3 credential resolution. The most common case is the credentials file generated by
aws configure(~/.aws/credentialson Linux/macOS,%USERPROFILE%\.aws\credentialson Windows). See the full boto3 credential chain for other sources (env vars, IAM roles, etc.). WeFor regular use, we recommend installing the AWS CLI and running aws configure once — better-aws will then pick up your credentials automatically with no extra configuration.
2) Configure your S3 "workspace"
Call aws.s3.config() once to set defaults for all subsequent operations. The main arguments:
bucket: default bucketkey_prefix: optional "root folder" — all keys are resolved relative to itoutput_type: tabularload()output ("pandas"/"polars")file_type: default format for DataFrame uploads without extension ("parquet"/"csv"/"xlsx")overwrite: default overwrite policy
aws.s3.config(bucket="my-bucket", key_prefix="research", output_type="polars", file_type="parquet", overwrite=False)
3) Read from S3
Two ways to read from S3:
download()= S3 → local files (returnsPathorList[Path])load()= S3 → Python objects (JSON → dict, tabular → DataFrame)
path = aws.s3.download("reports/report.pdf", to="downloads/")
cfg = aws.s3.load("configs/pipeline.json") # -> dict
df = aws.s3.load("raw/prices.csv") # -> pandas/polars (by config)
dfs = aws.s3.load(["raw/a.parquet", "raw/b.parquet"]) # -> List[DataFrame]
Batch native:
load()anddownload()accept a single key or a list of keys.
Both methods support glob patterns including recursive **:
# All CSVs directly under raw/
aws.s3.download("raw/*.csv", to="downloads/")
# All parquets recursively
dfs = aws.s3.load("data/**/*.parquet")
# Preserve the full S3 path structure locally (default: preserve relative to the glob root)
aws.s3.download("data/2023/*.csv", to="downloads/", preserve_prefix=True)
# -> downloads/data/2023/file.csv (instead of downloads/file.csv)
4) Write to S3
upload() supports:
- local file path or glob pattern → copied as-is, structure preserved
dict→ JSONbytes→ raw payload- pandas/polars DataFrame → CSV/Parquet/Excel (based on key extension or default
file_type)
aws.s3.upload("local/report.pdf", "reports/report.pdf")
aws.s3.upload({"run_id": 1}, "configs/run") # -> configs/run.json
aws.s3.upload(df, "processed/table") # -> processed/table.parquet
aws.s3.upload([df, df], ["processed/a.parquet", "processed/b.parquet"])
# Glob upload: preserve local structure under a single S3 prefix
aws.s3.upload("exports/*.csv", "s3-prefix/exports/")
upload() returns the final S3 key(s) after upload.
Batch native:
upload()accepts a single or list ofsrc/keypairs.
5) Transfer trees
transfer() moves or copies entire file trees between local filesystems and S3, or between two S3 locations. It auto-infers the direction from the source and destination.
# Local -> S3 (move by default, deletes local files after upload)
aws.s3.transfer("exports/", "s3://my-bucket/archives/exports/")
# S3 -> local (move: deletes the S3 objects after download)
aws.s3.transfer("raw/2023/", "local/backup/2023/", move=True)
# S3 -> S3 (copy within or across buckets)
aws.s3.transfer("s3://bucket-a/data/", "s3://bucket-b/data/", move=False)
# Glob patterns are supported
aws.s3.transfer("raw/**/*.parquet", "archive/parquet/")
# Use explicit buckets when needed
aws.s3.transfer("data/", "archive/", bucket_src="prod-bucket", bucket_dst="archive-bucket")
transfer() preserves relative directory structure at the destination. Pass move=False to copy instead of move.
6) Utilities
# Check existence
aws.s3.exists("raw/prices.parquet") # -> bool
# List objects — returns List[dict] with key, size, last_modified, etag, storage_class
aws.s3.list(prefix="raw/", with_meta=True)
# List keys only
keys = aws.s3.list(prefix="raw/", with_meta=False) # -> List[str]
# Delete (glob patterns supported, force=True required for patterns)
aws.s3.delete(["tmp/a.parquet", "tmp/b.parquet"])
aws.s3.delete("tmp/**", force=True)
# Pretty-print S3 prefix as a tree (sorted by size)
aws.s3.tree(prefix="data/", max_depth=3, folders_first=True)
7) Object serialization
better-aws can serialize arbitrary Python objects (e.g. scikit-learn models) directly to/from S3 using pickle, joblib, or skops.
Security: Requires
allow_unsafe_serialization=Trueinconfig(). Deserializing untrusted data is unsafe by design.
aws.s3.config(
bucket="my-bucket",
allow_unsafe_serialization=True,
object_base_format="joblib", # "pickle" | "joblib" | "skops"
joblib_compress=3,
)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier().fit(X_train, y_train)
aws.s3.upload(model, "models/rf_classifier") # -> models/rf_classifier.joblib
model = aws.s3.load("models/rf_classifier.joblib")
Supported extensions: .pkl / .pickle, .joblib / .jl, .skops.
8) Logging
verbose=False→ no package logsverbose=True→ a fewinfomessages (minimal, no spam)- Pass your own logger to unify output with your app (e.g., Rich handler)
import logging
from rich.logging import RichHandler
from better_aws import AWS
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.handlers = [RichHandler(rich_tracebacks=True)]
logger.propagate = False
# Custom logger
aws = AWS(profile="s3admin", region="eu-west-3", logger=logger, verbose=True)
# No logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=False)
# Minimal "print-like" logs
aws = AWS(profile="s3admin", region="eu-west-3", verbose=True)
API reference
AWS
AWS(
profile=None, # AWS profile name
region=None, # AWS region
logger=None, # Optional logging.Logger
verbose=False, # Enable info-level logs
retries=3, # Max retry attempts (botocore standard mode)
connect_timeout_s=10, # Connection timeout in seconds
read_timeout_s=300, # Read timeout in seconds
*,
credentials_file=None, # Path to a custom credentials file
config_file=None, # Path to a custom config file
env_file=None, # Path to a .env file with AWS credentials
aws_access_key_id=None, # Static access key ID
aws_secret_access_key=None, # Static secret access key
aws_session_token=None, # Optional session token
)
| Method | Returns | Description |
|---|---|---|
aws.s3 |
S3 |
S3 service wrapper (lazy-loaded) |
aws.identity(print_info=False) |
dict |
Get caller identity via STS (Arn, Account, UserId). Optionally logs it. |
aws.info(msg, *args) |
None |
Log a message if verbose=True |
aws.reset_session() |
None |
Clear the cached boto3 session (forces re-auth on next call) |
S3
config()
Sets defaults for all subsequent S3 operations. Must be called before using any S3 method that requires a bucket.
aws.s3.config(
bucket=None, # Default S3 bucket
*,
key_prefix="", # Prefix prepended to all keys
output_type="pandas", # Tabular load output: "pandas" | "polars"
file_type="parquet", # Default upload format: "csv" | "parquet" | "xlsx" | "xls" | serialization formats
overwrite=True, # Allow overwriting existing objects
encoding="utf-8", # Encoding for text-based I/O (JSON, CSV)
csv_sep=",", # CSV column separator
csv_index=False, # Include pandas index in CSV uploads
parquet_index=None, # Include pandas index in parquet uploads (None = pandas default)
excel_index=False, # Include pandas index in Excel uploads
allow_unsafe_serialization=False, # Enable pickle/joblib/skops serialization
object_base_format="pickle", # Default format for Python objects: "pickle" | "joblib" | "skops"
pickle_protocol=pickle.HIGHEST_PROTOCOL, # Pickle protocol version
joblib_compress=3, # Joblib compression level (0–9)
small_payload_threshold=5242880, # Max in-memory payload size (bytes) before switching to temp-file upload
multipart_threshold_mb=5, # File size threshold to trigger multipart upload/download
multipart_chunksize_mb=5, # Chunk size for multipart transfers
max_concurrency=8, # Max parallel threads for managed transfers
use_threads=True, # Enable threading for managed transfers
delete_batch_size=1000, # Max objects per delete_objects call (S3 hard limit: 1000)
)
list()
aws.s3.list(
prefix="", # Filter keys by prefix
*,
bucket=None, # Override default bucket
limit=None, # Max number of objects to return
recursive=True, # If False, list only direct children (non-recursive)
with_meta=True, # Include metadata in results
) -> List[dict] | List[str]
Returns a list of dicts when with_meta=True (fields: key, size, last_modified, etag, storage_class), or a list of key strings when with_meta=False.
exists()
aws.s3.exists(
key, # S3 object key
*,
bucket=None, # Override default bucket
) -> bool
Returns True if the object exists, False otherwise.
load()
aws.s3.load(
key, # str, List[str], or glob pattern
*,
bucket=None, # Override default bucket
output_type=None, # Override default output type: "pandas" | "polars"
) -> Any | List[Any]
Loads one or more S3 objects into Python objects. Format is inferred from the key extension:
| Extension | Output |
|---|---|
.json |
dict |
.csv, .parquet, .xlsx, .xls |
DataFrame (pandas or polars per output_type) |
.pkl, .pickle, .joblib, .jl, .skops |
Python object (requires allow_unsafe_serialization=True) |
| anything else | bytes |
Supports glob patterns (*, ?, **). Returns a single object for a single key, a list otherwise.
download()
aws.s3.download(
key, # str, List[str], or glob pattern
to=None, # Local destination path or directory (default: current directory)
*,
preserve_prefix=False, # If True, recreate the full S3 path locally.
# If False, preserve structure relative to the glob root.
bucket=None, # Override default bucket
) -> Path | List[Path]
Downloads one or more S3 objects to the local filesystem. Supports glob patterns including ** for recursive matching. Parent directories are created automatically.
upload()
aws.s3.upload(
src, # UploadInput or List[UploadInput]
# Supported types: str/Path (file or glob), dict, bytes, pd.DataFrame, pl.DataFrame
key, # str or List[str] — destination S3 key(s) or single prefix for glob sources
*,
bucket=None, # Override default bucket
overwrite=None, # Override default overwrite setting
) -> str | List[str]
Uploads one or more objects to S3. The serialization format is inferred from the key extension, or falls back to file_type from config. Returns the final S3 key(s).
delete()
aws.s3.delete(
key, # str, List[str], or glob pattern
*,
force=False, # Required when using glob patterns
bucket=None, # Override default bucket
) -> None
Deletes one or more S3 objects. Glob patterns (including **) are supported but require force=True as a safety guard. Deletions are batched in groups of up to delete_batch_size (default: 1000).
transfer()
aws.s3.transfer(
src, # str — source path, S3 key, glob pattern, or s3:// URI
dst, # str — destination path, S3 prefix, or s3:// URI
*,
move=True, # If True, delete source after successful transfer
bucket_src=None, # Override source bucket for S3 sources
bucket_dst=None, # Override destination bucket for S3 destinations
) -> str | List[str] | Path | List[Path]
Transfers file trees between local filesystems and S3, or between two S3 locations. The transfer direction is inferred automatically:
| Source | Destination | Mode |
|---|---|---|
| local path / glob | S3 key or URI | local → S3 |
| S3 key / URI | local path | S3 → local |
| S3 key / URI | S3 key or URI | S3 → S3 |
Relative directory structure is always preserved at the destination. Returns the destination path(s) created.
tree()
aws.s3.tree(
prefix="", # S3 prefix to display
*,
bucket=None, # Override default bucket
show_full_path=True, # Show full S3 key vs. basename only
max_depth=None, # Max depth to display (None = unlimited)
max_children=None, # Max children per node (None = unlimited)
folders_first=True, # Display folders before files at each level
limit=None, # Max number of S3 objects to include
) -> None
Pretty-prints the S3 object tree under a given prefix using rich, sorted by total size at each level.
License
MIT License
Copyright (c) 2026 better-aws Contributors
See LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file better_aws-1.0.0.tar.gz.
File metadata
- Download URL: better_aws-1.0.0.tar.gz
- Upload date:
- Size: 73.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
455991f0643ec48cefb2f79cd57f38ba2de35c87cdeddad980a436c1da1c18f0
|
|
| MD5 |
6623c5dd66bda19002a5fccf6715346f
|
|
| BLAKE2b-256 |
5d5faaf9ee1554959ebe5aba5dd300d2c684de7d2830072d1bbed8cb0e8d5feb
|
File details
Details for the file better_aws-1.0.0-py3-none-any.whl.
File metadata
- Download URL: better_aws-1.0.0-py3-none-any.whl
- Upload date:
- Size: 39.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e14a0be91885e7a1d6743fa5da3f0d4de7b589eb2595912106964a2651a61e48
|
|
| MD5 |
c70e9fe197521aad31b422a127c5a888
|
|
| BLAKE2b-256 |
7253bb2e9cdd6840e8fe114ad27bc71b737fe59973908a315792076384e47383
|