S3 Simple Sync - Make using S3 as simple as using local files
Project description
pos3
POsitronic S3 — Make using S3 as simple as using local files.
pos3 provides a Pythonic context manager for syncing directories and files with S3. It is designed for data processing pipelines and machine learning workflows where you need to integrate S3 with code that only understands local files.
The main value of
pos3is enabling you to pass S3 data to third-party libraries or legacy scripts that expect local file paths (e.g.,opencv,pandas.read_csv, or model training scripts). Instead of rewriting their I/O logic to support S3,pos3transparently bridges the gap.
Core Concepts
- Context Manager: All operations run within a
with pos3.mirror():block.- Enter: Initializes the sync environment (threads, cache).
- Body: You explicitly call
pos3.download()to fetch files andpos3.upload()to register outputs. - Exit: Uploads registered output paths (mirroring local to S3).
- Lazy & Efficient: Only transfers files that have changed (based on size/presence).
- Local Paths: All API calls return a
pathlib.Pathto the local file/directory. If you pass a local path instead of an S3 URL, it is passed through unchanged (no copy). - Background Sync: Can optionally upload changes in the background (e.g., every 60s) for long-running jobs.
Quick Start
The primary API is the pos3.mirror() context manager.
import pos3
# 1. Start the context
with pos3.mirror(cache_root='~/.cache/positronic/s3'):
# 2. Download Input
# - Downloads s3://bucket/data to cache
# - Deletes local files that don't exist in S3 (mirroring)
# - Returns local Path object
dataset_path = pos3.download('s3://bucket/data')
# 3. Sync Output (Resume & Upload)
# - Downloads existing checkpoints (to resume)
# - Registers path for background uploads
checkpoints_path = pos3.sync('s3://bucket/ckpt', interval=60, delete_remote=False)
# 4. Upload Logs (Write-only)
# - Creates local directory
# - Uploads new files to S3 on exit/interval
logs_path = pos3.upload('s3://bucket/logs', interval=30)
# 5. Use standard local file paths
print(f"Reading from {dataset_path}") # -> ~/.cache/positronic/s3/bucket/data
print(f"Writing to {checkpoints_path}") # -> ~/.cache/positronic/s3/bucket/ckpt
print(f"Logging to {logs_path}") # -> ~/.cache/positronic/s3/bucket/logs
train(dataset_path, checkpoints_path, logs_path)
API Guide
Note: All operational methods (
download,upload,sync,ls) must be called within an activepos3.mirror()context. Calling them outside will raise aRuntimeError.
pos3.mirror(...) / @pos3.with_mirror(...)
Context manager (or decorator) that activates the sync environment.
Parameters:
cache_root(default:'~/.cache/positronic/s3/'): Base directory for caching downloaded files.show_progress(default:True): Display tqdm progress bars.max_workers(default:10): Threads for parallel S3 operations.
Decorator Example:
@pos3.with_mirror(cache_root='/tmp/cache')
def main():
# Only works when called!
data_path = pos3.download('s3://bucket/data')
train(data_path)
if __name__ == "__main__":
main()
pos3.download(remote, local=None, delete=True, exclude=None)
Registers a path for download. Ensures local copy matches S3 immediately.
remote: S3 URL (e.g.,s3://bucket/key) or local path.local: Explicit local destination. Defaults to standard cache path.delete: IfTrue(default), deletes local files NOT in S3 ("mirror" behavior).exclude: List of glob patterns to skip.
Returns: pathlib.Path to the local directory/file.
pos3.upload(remote, local=None, interval=300, delete=True, sync_on_error=False, exclude=None)
Registers a local path for upload. Uploads on exit and optionally in background.
remote: Destination S3 URL.local: Local source path. Auto-resolved from cache path ifNone.interval: Seconds between background syncs.Nonefor exit-only.delete: IfTrue(default), deletes S3 files NOT present locally.sync_on_error: IfTrue, syncs even if the context exits with an exception.
Returns: pathlib.Path to the local directory/file.
pos3.sync(remote, local=None, interval=300, delete_local=True, delete_remote=True, sync_on_error=False, exclude=None)
Bi-directional helper. Performs download() then registers upload(). Useful for jobs that work on existing files, like when you resume training from a checkpoint.
delete_local: Cleanup local files during download.delete_remote: Cleanup remote files during upload. carefully consider setting toFalsewhen resuming jobs to avoid deleting history.
Returns: pathlib.Path to the local directory/file.
pos3.ls(prefix, recursive=False)
Lists files/objects in a directory or S3 prefix.
prefix: S3 URL or local path.recursive: List subdirectories ifTrue.
Returns: List of full S3 URLs or local paths.
Comparison with Libraries
Why use pos3 instead of other Python libraries?
| Feature | pos3 |
boto3 |
s3fs / fsspec |
|---|---|---|---|
| Abstraction Level | High (Context Manager) | Low (API Client) | Medium (File System) |
| Sync Logic | Built-in (Differential) | Manual Implementation | put/get (Recursive) |
| Lifecycle | Automated (Open/Close) | Manual | Manual |
| Background Upload | Yes (Non-blocking) | Manual Threading | No (Blocking) |
| Local I/O Speed | Native (SSD) | Native | Network Bound (Virtual FS) |
| Use Case | ML / Pipelines / 3rd Party Code | App Development | DataFrames / Interactive |
- vs
boto3:boto3is the raw AWS SDK.pos3wraps it to provide "mirroring" logic, threading, and diffing out of the box. - vs
s3fs:s3fstreats S3 as a filesystem.pos3treats S3 as a persistence layer for your high-speed local storage, ensuring you always get native IO performance.
Advanced Features
Profiles
Profiles enable accessing multiple S3-compatible endpoints simultaneously within the same context. This is useful when your workflow combines data from different sources:
import pos3
from pos3 import Profile
# Register profiles for different endpoints
pos3.register_profile('nebius-public',
endpoint='https://storage.eu-north1.nebius.cloud',
public=True # anonymous access, no credentials needed
)
pos3.register_profile('minio-local',
endpoint='http://localhost:9000',
region='us-east-1'
)
# Use multiple profiles in the same context
with pos3.mirror():
# Download public dataset from Nebius
dataset = pos3.download('s3://public-data/dataset/', profile='nebius-public')
# Download private config from local MinIO
config = pos3.download('s3://private/config/', profile='minio-local')
# Upload results to AWS (default boto3 credentials)
results = pos3.upload('s3://my-aws-bucket/results/')
train(dataset, config, results)
# You can also use inline Profile objects without registration
custom = Profile(local_name='custom', endpoint='https://custom.example.com', public=True)
with pos3.mirror():
data = pos3.download('s3://bucket/path', profile=custom)
# Or set a default profile for the entire context
with pos3.mirror(default_profile='nebius-public'):
data = pos3.download('s3://bucket/path') # uses nebius-public
Each profile has a local_name used in the cache path to keep files from different endpoints separate. When registering profiles, local_name defaults to the profile name. The default AWS profile uses _ as its local name.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pos3-0.2.1.tar.gz.
File metadata
- Download URL: pos3-0.2.1.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fa8029b4486bb5c5967d10ba6fa03c1efd1d794029611840be3b48aa742ebf5
|
|
| MD5 |
8c67c123fb09316f8de5720b69fa7c71
|
|
| BLAKE2b-256 |
8b5d592ddbe8a35a99a2ef88252e26bc138014fc9b49734b120cb83bab0d23c5
|
File details
Details for the file pos3-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pos3-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2b97a824ea822514e9b863d5462458654dfe18dd6a6b4911bb8a71746357842
|
|
| MD5 |
0f93bb9e8ef01a9a81b70d26c8a0eacd
|
|
| BLAKE2b-256 |
39df3d7a1e804d1fdf084380c092d779e34b80e5fbe2ed1100531829838fc513
|