A clean, simple wrapper around AWS services (S3, Textract, Bedrock)
Project description
aws-simple
A clean, simple Python wrapper around AWS services (S3, Textract, Bedrock).
Features
- Simple API: Clean, intuitive interface without exposing Boto3 complexity
- Environment-based configuration: No credentials or config in code
- Structured Textract output: Transforms AWS Blocks into clean, serializable JSON
- Type-safe: Fully typed with Python 3.10+ support
- Production-ready: Works with IAM roles, Docker, CI/CD pipelines
Installation
pip install aws-simple
Or install from source:
pip install -e .
Configuration
All configuration is done via environment variables:
# Required
export AWS_REGION=us-east-1
export AWS_S3_BUCKET=my-bucket-name
# Optional
export AWS_PROFILE=my-profile # For local development
export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
export AWS_TEXTRACT_REGION=us-east-1
export AWS_BEDROCK_REGION=us-east-1
Or use a .env file (see .env.example).
AWS Credentials
AWS credentials should be configured separately via:
- IAM Role (recommended for production/EC2/ECS/Lambda)
- ~/.aws/credentials file (for local development)
- Environment variables
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY(not recommended)
Usage
S3 Operations
from aws_simple import s3
# Upload file
s3.upload_file("document.pdf", "docs/document.pdf")
# Download file
s3.download_file("docs/document.pdf", "/tmp/document.pdf")
# Read object as bytes
content = s3.read_object("docs/document.pdf")
# List objects
files = s3.list_objects(prefix="docs/")
# Check if object exists
exists = s3.object_exists("docs/document.pdf")
Textract - Document Extraction
from aws_simple import textract
import json
# Extract from local file (with tables)
doc = textract.extract_text_from_file("invoice.pdf")
# Extract from S3 (with tables)
doc = textract.extract_text_from_s3("docs/invoice.pdf")
# Access structured data
print(doc.full_text) # All text concatenated
print(f"Pages: {len(doc.pages)}")
# Access page details
page = doc.pages[0]
print(f"Lines: {len(page.lines)}")
print(f"Tables: {len(page.tables)}")
# Access lines
for line in page.lines:
print(f"{line.text} (confidence: {line.confidence})")
# Access tables
for table in page.tables:
print(f"Table: {table.rows}x{table.columns}")
print(table.cells) # 2D matrix of cell values
# Serialize to JSON
doc_json = doc.to_dict()
with open("result.json", "w") as f:
json.dump(doc_json, f, indent=2)
# Simple text extraction (faster, no tables)
text = textract.extract_text_simple_from_file("document.pdf")
Textract Output Format
The library transforms AWS Textract Blocks into a clean JSON structure:
{
"pages": [
{
"page_number": 1,
"width": 1.0,
"height": 1.0,
"lines": [
{
"text": "Invoice #12345",
"confidence": 99.5,
"bounding_box": {"top": 0.1, "left": 0.1, "width": 0.2, "height": 0.05}
}
],
"tables": [
{
"rows": 3,
"columns": 2,
"cells": [
["Item", "Price"],
["Product A", "$10"],
["Product B", "$20"]
],
"confidence": 98.7
}
],
"raw_text": "Invoice #12345\n..."
}
],
"full_text": "All text from all pages concatenated...",
"metadata": {
"document_metadata": {...},
"total_pages": 1
}
}
Bedrock - LLM Operations
from aws_simple import bedrock
# Simple text generation
response = bedrock.invoke("Explain AWS Lambda in one sentence")
print(response)
# With system prompt and parameters
response = bedrock.invoke(
prompt="What are the benefits of serverless?",
system_prompt="You are an AWS solutions architect.",
temperature=0.7,
max_tokens=500
)
# Request JSON output
prompt = """
List 3 AWS services with their use cases.
Format: {"services": [{"name": "...", "use_case": "..."}]}
"""
data = bedrock.invoke_json(prompt)
print(data["services"])
# Use different model
response = bedrock.invoke(
"Summarize this text...",
model_id="anthropic.claude-3-5-sonnet-20241022-v2:0"
)
Combined Workflow
from aws_simple import s3, textract, bedrock
import json
# 1. Upload document
s3.upload_file("invoice.pdf", "invoices/2024/inv_001.pdf")
# 2. Extract content
doc = textract.extract_text_from_s3("invoices/2024/inv_001.pdf")
# 3. Analyze with LLM
prompt = f"""
Extract key information from this invoice:
{doc.full_text}
Return JSON with: invoice_number, date, total, vendor
"""
invoice_data = bedrock.invoke_json(prompt)
print(json.dumps(invoice_data, indent=2))
Architecture
aws-simple/
├── config.py # Environment variable configuration
├── exceptions.py # Custom exceptions
├── _clients.py # AWS client factory (internal)
├── s3.py # S3 operations
├── textract.py # Textract operations
├── bedrock.py # Bedrock operations
├── models/ # Data models
│ └── textract.py # TextractDocument, TextractPage, etc.
└── _parsers/ # Internal parsers
└── textract_parser.py # Transforms Blocks → JSON
Design Principles
- No Boto3 in public API: AWS implementation details are hidden
- Environment-based config: All configuration via env vars
- Clean output formats: No raw AWS responses exposed
- Type safety: Full type hints for better IDE support
- Simple error handling: Custom exceptions for each service
- Production-ready: Compatible with Docker, IAM roles, CI/CD
Exceptions
from aws_simple import (
AWSSimpleError, # Base exception
ConfigurationError, # Missing/invalid configuration
S3Error, # S3 operation failures
TextractError, # Textract operation failures
BedrockError, # Bedrock operation failures
ClientInitializationError # AWS client init failures
)
try:
doc = textract.extract_text_from_s3("missing.pdf")
except TextractError as e:
print(f"Extraction failed: {e}")
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy src/
# Linting
ruff check src/
Requirements
- Python ≥ 3.10
- boto3 ≥ 1.34.0
- python-dotenv ≥ 1.0.0
License
MIT
Support
For issues and feature requests, please visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aws_simple-0.1.1b0.tar.gz.
File metadata
- Download URL: aws_simple-0.1.1b0.tar.gz
- Upload date:
- Size: 29.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe421801b948283f75dcecde22b280559b177cb1b8bec82a97fb2168ed18031f
|
|
| MD5 |
c9d2ae1dc53afee979ba6b5f82e6871c
|
|
| BLAKE2b-256 |
e6558fb2daa38d3992520e4a573c1159ffb5d2ee521316b8006861df6c731cae
|
File details
Details for the file aws_simple-0.1.1b0-py3-none-any.whl.
File metadata
- Download URL: aws_simple-0.1.1b0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23c018e19946e46b28aad8c6def8c409d79f55a837038c2513e63bf74ddcda66
|
|
| MD5 |
e471e99a7738f4d5ef8571b3f9a9f24e
|
|
| BLAKE2b-256 |
d0998ca85c54229b168e01b4033881dc3f457f15af6f5379ede09ec5e8a3b19e
|