Amazon Redshift adapter for Datus Agent
Project description
datus-redshift
Amazon Redshift database adapter for Datus.
Installation
pip install datus-redshift
This will automatically install the required dependencies:
datus-agentredshift_connector
Usage
The adapter is automatically registered with Datus when installed. Configure your database connection in your Datus configuration:
database:
type: redshift
host: your-cluster.xxx.us-west-2.redshift.amazonaws.com
port: 5439
username: your_username
password: your_password
database: dev
schema: public
Or use programmatically:
from datus_redshift import RedshiftConfig, RedshiftConnector
# Using config object
config = RedshiftConfig(
host="your-cluster.xxx.us-west-2.redshift.amazonaws.com",
username="your_username",
password="your_password",
database="dev",
schema="public",
port=5439,
ssl=True,
)
connector = RedshiftConnector(config)
# Test connection
connector.test_connection()
# Execute query
result = connector.execute_query("SELECT * FROM users LIMIT 10", result_format="list")
print(result.sql_return)
# Get table list
tables = connector.get_tables(schema_name="public")
print(f"Tables: {tables}")
# Get table schema
schema = connector.get_schema(table_name="users")
for column in schema:
print(f"{column['name']}: {column['type']}")
# Close connection when done
connector.close()
IAM Authentication
Recommended: Use IAM role-based authentication instead of embedding static credentials. When running on EC2/ECS/Lambda with an attached IAM role, omit
access_key_idandsecret_access_key-- the SDK will use the instance profile credentials automatically.
config = RedshiftConfig(
host="your-cluster.xxx.us-west-2.redshift.amazonaws.com",
username="your_iam_user",
database="dev",
iam=True,
cluster_identifier="your-cluster-identifier",
region="us-west-2",
# Omit access_key_id/secret_access_key when using IAM role-based auth.
# Only set these for local development with static credentials:
# access_key_id="YOUR_ACCESS_KEY",
# secret_access_key="YOUR_SECRET_KEY",
)
connector = RedshiftConnector(config)
Query Result Formats
# CSV string
result = connector.execute_query("SELECT * FROM my_table", result_format="csv")
# Pandas DataFrame
result = connector.execute_query("SELECT * FROM my_table", result_format="pandas")
# Arrow table (best for large datasets)
result = connector.execute_query("SELECT * FROM my_table", result_format="arrow")
# List of dictionaries
result = connector.execute_query("SELECT * FROM my_table", result_format="list")
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
| host | str | (required) | Redshift cluster endpoint |
| username | str | (required) | Username for authentication |
| password | str | (required) | Password for authentication |
| port | int | 5439 | Redshift port |
| database | str | None | Default database |
| schema | str | None | Default schema (uses 'public' if not specified) |
| timeout_seconds | int | 30 | Connection timeout |
| ssl | bool | True | Enable SSL/TLS connection |
| iam | bool | False | Use IAM authentication |
| cluster_identifier | str | None | Cluster ID for IAM auth |
| region | str | None | AWS region for IAM auth |
| access_key_id | str | None | AWS access key for IAM auth |
| secret_access_key | str | None | AWS secret key for IAM auth |
Features
- Query execution via Redshift SQL (SELECT)
- DDL execution (CREATE, ALTER, DROP)
- DML operations (INSERT, UPDATE, DELETE)
- Metadata retrieval (databases, schemas, tables, views, materialized views, columns)
- Sample data extraction
- Multiple result formats (pandas, arrow, csv, list)
- Connection management with SSL/TLS support
- IAM authentication support
- Comprehensive error handling with exception mapping
Testing
Quick Start
cd datus-redshift
# Unit tests (no database required)
uv run pytest tests/unit/ -v
# All unit tests with coverage
uv run pytest tests/unit/ -v --cov=datus_redshift --cov-report=term-missing
Integration Tests (Requires Redshift Cluster)
Integration tests require a running Redshift cluster. Set these environment variables:
| Variable | Required | Default | Description |
|---|---|---|---|
REDSHIFT_HOST |
Yes | - | Redshift cluster endpoint |
REDSHIFT_USERNAME |
Yes | - | Username |
REDSHIFT_PASSWORD |
Yes | - | Password |
REDSHIFT_DATABASE |
No | dev |
Database name |
REDSHIFT_PORT |
No | 5439 |
Port |
REDSHIFT_SCHEMA |
No | public |
Schema |
# Set credentials
export REDSHIFT_HOST="my-cluster.abc123.us-west-2.redshift.amazonaws.com"
export REDSHIFT_USERNAME="admin"
export REDSHIFT_PASSWORD="secret"
# Run all integration tests
uv run pytest tests/integration/ -v
# Run only connector tests
uv run pytest tests/integration/test_connector.py -v
# Run only TPC-H tests
uv run pytest tests/integration/test_tpch.py -v
# Run acceptance tests (core functionality)
uv run pytest tests/ -m acceptance -v
TPC-H Test Data
Integration tests include TPC-H benchmark data for realistic query testing. The tpch_setup fixture (session-scoped) automatically creates 5 tables with sample data:
| Table | Rows | Description |
|---|---|---|
tpch_region |
5 | Standard TPC-H regions |
tpch_nation |
25 | Standard TPC-H nations |
tpch_customer |
10 | Simplified customer data |
tpch_orders |
15 | Simplified order data |
tpch_supplier |
5 | Simplified supplier data |
Tables are created at the start of the test session and dropped after all tests complete.
Initialize TPC-H Data Manually
To create TPC-H data for use with Datus (outside of tests):
# Basic usage (with environment variables)
uv run python scripts/init_tpch_data.py
# Drop existing tables and re-create
uv run python scripts/init_tpch_data.py --drop
# Custom connection
uv run python scripts/init_tpch_data.py \
--host my-cluster.abc123.us-west-2.redshift.amazonaws.com \
--username admin \
--password secret
# Use a custom schema
uv run python scripts/init_tpch_data.py --schema my_test_schema
Test Statistics
- Unit Tests: 53 tests (config validation, connector logic, exception handling, identifiers)
- Integration Tests: 11 tests (connection, metadata, SQL execution, output formats)
- TPC-H Tests: 11 tests (metadata queries, joins, aggregations, multiple formats)
- Total: 75 tests
Test Markers
| Marker | Description |
|---|---|
integration |
Requires running Redshift cluster |
acceptance |
Core functionality validation for CI/CD |
Development
Setup
# From workspace root
uv sync --all-packages
# Or install in editable mode
uv pip install -e .
Code Structure
datus-redshift/
├── datus_redshift/
│ ├── __init__.py # Module initialization and registration
│ ├── config.py # Configuration class (RedshiftConfig)
│ └── connector.py # Main connector implementation (RedshiftConnector)
├── tests/
│ ├── conftest.py # Shared test configuration and markers
│ ├── unit/
│ │ ├── test_config.py # Config validation tests
│ │ └── test_connector_unit.py # Connector unit tests
│ └── integration/
│ ├── conftest.py # Integration fixtures and TPC-H data
│ ├── test_connector.py # Connector integration tests
│ └── test_tpch.py # TPC-H query tests
├── scripts/
│ └── init_tpch_data.py # TPC-H data initialization script
├── pyproject.toml # Package configuration
└── README.md # This file
Code Quality
black datus_redshift tests
isort datus_redshift tests
ruff check datus_redshift tests
Redshift SQL Notes
Redshift is based on PostgreSQL but has some differences:
- Identifiers: Use double quotes for quoting:
"schema"."table" - No IF NOT EXISTS for schemas: Use
CREATE SCHEMAwithoutIF NOT EXISTSin older versions - DECIMAL precision: Use
DECIMAL(15,2)for monetary values - VARCHAR limits: Default VARCHAR is 256 chars; specify length explicitly
- Distribution styles: Use
DISTSTYLE,DISTKEY,SORTKEYfor performance tuning
Troubleshooting
Connection Issues
- Timeout errors: Increase
timeout_secondsin the configuration - SSL errors: Try setting
ssl=Falseif your cluster doesn't require SSL - IAM auth fails: Verify your AWS credentials and cluster identifier are correct
- VPC access: Ensure your network can reach the Redshift endpoint (security groups, VPN)
Query Performance
- Use
result_format="arrow"for large result sets (most efficient) - Always specify schema names to avoid scanning all schemas
- Use LIMIT clauses for exploratory queries
Requirements
- Python >= 3.8
- datus-agent >= 0.2.1
- redshift_connector >= 2.0.0
- pyarrow (installed with datus-agent)
- pandas (installed with datus-agent)
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datus_redshift-0.1.6rc1.tar.gz.
File metadata
- Download URL: datus_redshift-0.1.6rc1.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d550a6990e8c4d42df1ababc7f21f86cf398ac48c19e87f1e9f0d8831db7eae5
|
|
| MD5 |
761595367df28b44c9a28f087024aaed
|
|
| BLAKE2b-256 |
abbeb45bb0bdd8867396bb1ff61331575574f68abc6f5a66c8ac2b63367d4681
|
File details
Details for the file datus_redshift-0.1.6rc1-py3-none-any.whl.
File metadata
- Download URL: datus_redshift-0.1.6rc1-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6476ac83f3b0f47ce2547f02595ff2c757b0f407750446f5719429018ad193da
|
|
| MD5 |
8d3c68be5a1084afc097a741499d932e
|
|
| BLAKE2b-256 |
ae45e1b94e64971242e0f500513c082ca1525cd31a176800b756ca9ad0600e29
|