Utilities library for read/write operations and general data cleaning routines
Project description
awsio
A small utilities library to simplify reading/writing and basic data workflows with AWS services (S3, Athena, SSO). Provides lightweight helpers for path handling, parallel reads, simple S3 IO and an SSO-based auth flow.
Key goals:
- Simplify common S3 and Athena interactions.
- Provide sensible defaults and support multiple auth methods (SSO, profiles, env, explicit keys).
- Small, dependency-light helpers for datalake style workflows.
Features
- AWSio class for authenticated sessions and common actions:
- read_s3_file, read_json_from_s3, download_file, read_athena_query
- SSO device-flow helper for short-lived role credentials (auth.py)
- Path helpers: split_bucket_key, date extraction helpers (path.py)
- Parallelized reading utilities using joblib (parallelism.py)
- Utilities to read versioned data sets from S3 including date filtering (versioned.py)
Installation
Install from source (recommended during development):
- Clone the repo git clone https://github.com/jotap123/awsio.git
- Create a virtual env and install python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install -r requirements.txt pip install -e .
Or install via pip when a package is published: pip install awsio
Quickstart
-
Basic usage with default credential chain (env, shared credentials, IAM role): from awsio.io import AWSio reader = AWSio() bucket, key = 'my-bucket', 'path/to/file.txt' content = reader.read_s3_file(bucket, key) print(content)
-
Use a named profile from your AWS shared credentials: reader = AWSio(profile_name='dev') # 'dev' uses SSO auth flow in this project df = reader.read_athena_query("SELECT * FROM my_db.my_table LIMIT 10", "s3://my-bucket/athena-results/")
-
Explicit credentials (not recommended for production): reader = AWSio(aws_secrets={ 'aws_access_key_id': 'AKIA...', 'aws_secret_access_key': '...', 'aws_session_token': '...' # optional })
-
Read a JSON file from S3: obj = reader.read_json_from_s3('my-bucket', 'config/my.json')
API Highlights
-
AWSio(session selection)
- init(aws_secrets=None, profile_name=None)
- Picks authentication strategy: profile (SSO logic for
dev), explicit keys, or default chain.
- Picks authentication strategy: profile (SSO logic for
- read_s3_file(bucket, key, encoding='utf-8')
- Returns file contents (str) or bytes if encoding is None. Raises clear exceptions for common AWS errors.
- read_json_from_s3(bucket, key)
- Returns parsed JSON from S3.
- download_file(bucket, key, local_path)
- Downloads an object to local filesystem.
- read_athena_query(query, s3_output)
- Runs an Athena query and returns a pandas.DataFrame (reads result rows returned by Athena).
- init(aws_secrets=None, profile_name=None)
-
auth.authentication(sso_oidc, use_cache=True)
- Implements device code flow and caches short-lived access token to ~/.aws_sso_oidc_cache.json.
-
path.split_bucket_key(s3_uri, type='file'|'folder')
- Splits s3://bucket/key into (bucket, key) and optionally ensures folder trailing slash.
-
versioned.parallel_read / load_history
- Utilities to read many S3 files (parquet/csv/excel) with date filtering and parallel reads.
-
parallelism.applyParallel
- Run groupby apply-like functions in parallel using joblib.
Configuration & Environment
- Region: set AWS_REGION environment variable to change default region (defaults to us-east-1).
- SSO config: START_URL, OIDC_APP_NAME, ACCOUNT_ID, ROLE_NAME environment variables are used by the SSO flow.
- Cache: token cache is saved to ~/.aws_sso_oidc_cache.json by default.
Development
- Tests: Add tests under a tests/ folder and run using pytest.
- Formatting & linting: follow pyproject.toml and setup.cfg settings.
- Contributing: open issues and PRs on the repo. Keep changes small and document behavior changes.
Troubleshooting
- No credentials found: ensure environment variables or credentials file are set, or run on an environment with an attached IAM role.
- SSO flow issues: check START_URL and OIDC_APP_NAME env vars; clear the cache file to force re-auth.
License
See LICENSE in the repository root.
Contact
Project: https://github.com/jotap123/awsio Issues: https://github.com/jotap123/awsio/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awsio-0.1.tar.gz.
File metadata
- Download URL: awsio-0.1.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4fd195806752935d5c0e0047de1d6b44188008990fdedba344d486f1cb88d7f
|
|
| MD5 |
015754d344328d4740fc008c2df21ce4
|
|
| BLAKE2b-256 |
2b835c569055c0e15f373543d4a5c97f5d63147022023ac0dfe89b57b211a2a0
|
File details
Details for the file awsio-0.1-py3-none-any.whl.
File metadata
- Download URL: awsio-0.1-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa327d8945a998cc9cf5677b7637436bb24bb27a18a38973f70592c6f73854b4
|
|
| MD5 |
c2c11fb20dffe48a46b580ab77de8f0f
|
|
| BLAKE2b-256 |
3c08906859f9f55bcc2368083dbf99e0bda79d8728d8303b7f6dcba9809855ba
|