Skip to main content

Utilities library for read/write operations and general data cleaning routines

Project description

awsio

A small utilities library to simplify reading/writing and basic data workflows with AWS services (S3, Athena, SSO). Provides lightweight helpers for path handling, parallel reads, simple S3 IO and an SSO-based auth flow.

Key goals:

  • Simplify common S3 and Athena interactions.
  • Provide sensible defaults and support multiple auth methods (SSO, profiles, env, explicit keys).
  • Small, dependency-light helpers for datalake style workflows.

Features

  • AWSio class for authenticated sessions and common actions:
    • read_s3_file, read_json_from_s3, download_file, read_athena_query
  • SSO device-flow helper for short-lived role credentials (auth.py)
  • Path helpers: split_bucket_key, date extraction helpers (path.py)
  • Parallelized reading utilities using joblib (parallelism.py)
  • Utilities to read versioned data sets from S3 including date filtering (versioned.py)

Installation

Install from source (recommended during development):

  1. Clone the repo git clone https://github.com/jotap123/awsio.git
  2. Create a virtual env and install python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install -r requirements.txt pip install -e .

Or install via pip when a package is published: pip install awsio

Quickstart

  1. Basic usage with default credential chain (env, shared credentials, IAM role): from awsio.io import AWSio reader = AWSio() bucket, key = 'my-bucket', 'path/to/file.txt' content = reader.read_s3_file(bucket, key) print(content)

  2. Use a named profile from your AWS shared credentials: reader = AWSio(profile_name='dev') # 'dev' uses SSO auth flow in this project df = reader.read_athena_query("SELECT * FROM my_db.my_table LIMIT 10", "s3://my-bucket/athena-results/")

  3. Explicit credentials (not recommended for production): reader = AWSio(aws_secrets={ 'aws_access_key_id': 'AKIA...', 'aws_secret_access_key': '...', 'aws_session_token': '...' # optional })

  4. Read a JSON file from S3: obj = reader.read_json_from_s3('my-bucket', 'config/my.json')

API Highlights

  • AWSio(session selection)

    • init(aws_secrets=None, profile_name=None)
      • Picks authentication strategy: profile (SSO logic for dev), explicit keys, or default chain.
    • read_s3_file(bucket, key, encoding='utf-8')
      • Returns file contents (str) or bytes if encoding is None. Raises clear exceptions for common AWS errors.
    • read_json_from_s3(bucket, key)
      • Returns parsed JSON from S3.
    • download_file(bucket, key, local_path)
      • Downloads an object to local filesystem.
    • read_athena_query(query, s3_output)
      • Runs an Athena query and returns a pandas.DataFrame (reads result rows returned by Athena).
  • auth.authentication(sso_oidc, use_cache=True)

    • Implements device code flow and caches short-lived access token to ~/.aws_sso_oidc_cache.json.
  • path.split_bucket_key(s3_uri, type='file'|'folder')

    • Splits s3://bucket/key into (bucket, key) and optionally ensures folder trailing slash.
  • versioned.parallel_read / load_history

    • Utilities to read many S3 files (parquet/csv/excel) with date filtering and parallel reads.
  • parallelism.applyParallel

    • Run groupby apply-like functions in parallel using joblib.

Configuration & Environment

  • Region: set AWS_REGION environment variable to change default region (defaults to us-east-1).
  • SSO config: START_URL, OIDC_APP_NAME, ACCOUNT_ID, ROLE_NAME environment variables are used by the SSO flow.
  • Cache: token cache is saved to ~/.aws_sso_oidc_cache.json by default.

Development

  • Tests: Add tests under a tests/ folder and run using pytest.
  • Formatting & linting: follow pyproject.toml and setup.cfg settings.
  • Contributing: open issues and PRs on the repo. Keep changes small and document behavior changes.

Troubleshooting

  • No credentials found: ensure environment variables or credentials file are set, or run on an environment with an attached IAM role.
  • SSO flow issues: check START_URL and OIDC_APP_NAME env vars; clear the cache file to force re-auth.

License

See LICENSE in the repository root.

Contact

Project: https://github.com/jotap123/awsio Issues: https://github.com/jotap123/awsio/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awsio-0.2.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

awsio-0.2.0-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file awsio-0.2.0.tar.gz.

File metadata

  • Download URL: awsio-0.2.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for awsio-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0d39299d295b6495f28a7662eafcfd0dcc7ce5b020aef07f99ec10609250f030
MD5 4a0d9cf04afccdae0a7fa8bd86a83274
BLAKE2b-256 a84f4b78b84617533f2decf82472961d33cfed744b4828a64bc55f7996bb5813

See more details on using hashes here.

File details

Details for the file awsio-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: awsio-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for awsio-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce0b94a66efc25042c4aa10e560390d434484a01c58ded6f524c1ae5be8fab10
MD5 3699def19cd01c8f41cbf92d85f5b339
BLAKE2b-256 208ebe7f408743722a56c5b6f224f36df5f4ac501c71bd702c8ba97952b41f4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page