Amazon Redshift adapter for Datus Agent

Project description

datus-redshift

Amazon Redshift database adapter for Datus.

Installation

pip install datus-redshift

This will automatically install the required dependencies:

datus-agent
redshift_connector

Usage

The adapter is automatically registered with Datus when installed. Configure your database connection in your Datus configuration:

database:
  type: redshift
  host: your-cluster.xxx.us-west-2.redshift.amazonaws.com
  port: 5439
  username: your_username
  password: your_password
  database: dev
  schema: public

Or use programmatically:

from datus_redshift import RedshiftConfig, RedshiftConnector

# Using config object
config = RedshiftConfig(
    host="your-cluster.xxx.us-west-2.redshift.amazonaws.com",
    username="your_username",
    password="your_password",
    database="dev",
    schema="public",
    port=5439,
    ssl=True,
)
connector = RedshiftConnector(config)

# Test connection
connector.test_connection()

# Execute query
result = connector.execute_query("SELECT * FROM users LIMIT 10", result_format="list")
print(result.sql_return)

# Get table list
tables = connector.get_tables(schema_name="public")
print(f"Tables: {tables}")

# Get table schema
schema = connector.get_schema(table_name="users")
for column in schema:
    print(f"{column['name']}: {column['type']}")

# Close connection when done
connector.close()

IAM Authentication

Recommended: Use IAM role-based authentication instead of embedding static credentials. When running on EC2/ECS/Lambda with an attached IAM role, omit access_key_id and secret_access_key -- the SDK will use the instance profile credentials automatically.

config = RedshiftConfig(
    host="your-cluster.xxx.us-west-2.redshift.amazonaws.com",
    username="your_iam_user",
    database="dev",
    iam=True,
    cluster_identifier="your-cluster-identifier",
    region="us-west-2",
    # Omit access_key_id/secret_access_key when using IAM role-based auth.
    # Only set these for local development with static credentials:
    # access_key_id="YOUR_ACCESS_KEY",
    # secret_access_key="YOUR_SECRET_KEY",
)
connector = RedshiftConnector(config)

Query Result Formats

# CSV string
result = connector.execute_query("SELECT * FROM my_table", result_format="csv")

# Pandas DataFrame
result = connector.execute_query("SELECT * FROM my_table", result_format="pandas")

# Arrow table (best for large datasets)
result = connector.execute_query("SELECT * FROM my_table", result_format="arrow")

# List of dictionaries
result = connector.execute_query("SELECT * FROM my_table", result_format="list")

Configuration Options

Option	Type	Default	Description
host	str	(required)	Redshift cluster endpoint
username	str	(required)	Username for authentication
password	str	(required)	Password for authentication
port	int	5439	Redshift port
database	str	None	Default database
schema	str	None	Default schema (uses 'public' if not specified)
timeout_seconds	int	30	Connection timeout
ssl	bool	True	Enable SSL/TLS connection
iam	bool	False	Use IAM authentication
cluster_identifier	str	None	Cluster ID for IAM auth
region	str	None	AWS region for IAM auth
access_key_id	str	None	AWS access key for IAM auth
secret_access_key	str	None	AWS secret key for IAM auth

Features

Query execution via Redshift SQL (SELECT)
DDL execution (CREATE, ALTER, DROP)
DML operations (INSERT, UPDATE, DELETE)
Metadata retrieval (databases, schemas, tables, views, materialized views, columns)
Sample data extraction
Multiple result formats (pandas, arrow, csv, list)
Connection management with SSL/TLS support
IAM authentication support
Comprehensive error handling with exception mapping

Testing

Quick Start

cd datus-redshift

# Unit tests (no database required)
uv run pytest tests/unit/ -v

# All unit tests with coverage
uv run pytest tests/unit/ -v --cov=datus_redshift --cov-report=term-missing

Integration Tests (Requires Redshift Cluster)

Integration tests require a running Redshift cluster. Set these environment variables:

Variable	Required	Default	Description
`REDSHIFT_HOST`	Yes	-	Redshift cluster endpoint
`REDSHIFT_USERNAME`	Yes	-	Username
`REDSHIFT_PASSWORD`	Yes	-	Password
`REDSHIFT_DATABASE`	No	`dev`	Database name
`REDSHIFT_PORT`	No	`5439`	Port
`REDSHIFT_SCHEMA`	No	`public`	Schema

# Set credentials
export REDSHIFT_HOST="my-cluster.abc123.us-west-2.redshift.amazonaws.com"
export REDSHIFT_USERNAME="admin"
export REDSHIFT_PASSWORD="secret"

# Run all integration tests
uv run pytest tests/integration/ -v

# Run only connector tests
uv run pytest tests/integration/test_connector.py -v

# Run only TPC-H tests
uv run pytest tests/integration/test_tpch.py -v

# Run acceptance tests (core functionality)
uv run pytest tests/ -m acceptance -v

TPC-H Test Data

Integration tests include TPC-H benchmark data for realistic query testing. The tpch_setup fixture (session-scoped) automatically creates 5 tables with sample data:

Table	Rows	Description
`tpch_region`	5	Standard TPC-H regions
`tpch_nation`	25	Standard TPC-H nations
`tpch_customer`	10	Simplified customer data
`tpch_orders`	15	Simplified order data
`tpch_supplier`	5	Simplified supplier data

Tables are created at the start of the test session and dropped after all tests complete.

Initialize TPC-H Data Manually

To create TPC-H data for use with Datus (outside of tests):

# Basic usage (with environment variables)
uv run python scripts/init_tpch_data.py

# Drop existing tables and re-create
uv run python scripts/init_tpch_data.py --drop

# Custom connection
uv run python scripts/init_tpch_data.py \
  --host my-cluster.abc123.us-west-2.redshift.amazonaws.com \
  --username admin \
  --password secret

# Use a custom schema
uv run python scripts/init_tpch_data.py --schema my_test_schema

Test Statistics

Unit Tests: 53 tests (config validation, connector logic, exception handling, identifiers)
Integration Tests: 11 tests (connection, metadata, SQL execution, output formats)
TPC-H Tests: 11 tests (metadata queries, joins, aggregations, multiple formats)
Total: 75 tests

Test Markers

Marker	Description
`integration`	Requires running Redshift cluster
`acceptance`	Core functionality validation for CI/CD

Development

Setup

# From workspace root
uv sync --all-packages

# Or install in editable mode
uv pip install -e .

Code Structure

datus-redshift/
├── datus_redshift/
│   ├── __init__.py       # Module initialization and registration
│   ├── config.py         # Configuration class (RedshiftConfig)
│   └── connector.py      # Main connector implementation (RedshiftConnector)
├── tests/
│   ├── conftest.py       # Shared test configuration and markers
│   ├── unit/
│   │   ├── test_config.py          # Config validation tests
│   │   └── test_connector_unit.py  # Connector unit tests
│   └── integration/
│       ├── conftest.py             # Integration fixtures and TPC-H data
│       ├── test_connector.py       # Connector integration tests
│       └── test_tpch.py            # TPC-H query tests
├── scripts/
│   └── init_tpch_data.py  # TPC-H data initialization script
├── pyproject.toml         # Package configuration
└── README.md              # This file

Code Quality

black datus_redshift tests
isort datus_redshift tests
ruff check datus_redshift tests

Redshift SQL Notes

Redshift is based on PostgreSQL but has some differences:

Identifiers: Use double quotes for quoting: "schema"."table"
No IF NOT EXISTS for schemas: Use CREATE SCHEMA without IF NOT EXISTS in older versions
DECIMAL precision: Use DECIMAL(15,2) for monetary values
VARCHAR limits: Default VARCHAR is 256 chars; specify length explicitly
Distribution styles: Use DISTSTYLE, DISTKEY, SORTKEY for performance tuning

Troubleshooting

Connection Issues

Timeout errors: Increase timeout_seconds in the configuration
SSL errors: Try setting ssl=False if your cluster doesn't require SSL
IAM auth fails: Verify your AWS credentials and cluster identifier are correct
VPC access: Ensure your network can reach the Redshift endpoint (security groups, VPN)

Query Performance

Use result_format="arrow" for large result sets (most efficient)
Always specify schema names to avoid scanning all schemas
Use LIMIT clauses for exploratory queries

Requirements

Python >= 3.8
datus-agent >= 0.2.1
redshift_connector >= 2.0.0
pyarrow (installed with datus-agent)
pandas (installed with datus-agent)

License

Apache License 2.0

Project details

Release history Release notifications | RSS feed

This version

0.1.6rc1 pre-release

Apr 16, 2026

0.1.3

Apr 24, 2026

0.1.2

Mar 20, 2026

0.1.1

Mar 19, 2026

0.1.0

Nov 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_redshift-0.1.6rc1.tar.gz (25.6 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datus_redshift-0.1.6rc1-py3-none-any.whl (16.5 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file datus_redshift-0.1.6rc1.tar.gz.

File metadata

Download URL: datus_redshift-0.1.6rc1.tar.gz
Upload date: Apr 16, 2026
Size: 25.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for datus_redshift-0.1.6rc1.tar.gz
Algorithm	Hash digest
SHA256	`d550a6990e8c4d42df1ababc7f21f86cf398ac48c19e87f1e9f0d8831db7eae5`
MD5	`761595367df28b44c9a28f087024aaed`
BLAKE2b-256	`abbeb45bb0bdd8867396bb1ff61331575574f68abc6f5a66c8ac2b63367d4681`

See more details on using hashes here.

File details

Details for the file datus_redshift-0.1.6rc1-py3-none-any.whl.

File metadata

Download URL: datus_redshift-0.1.6rc1-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for datus_redshift-0.1.6rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6476ac83f3b0f47ce2547f02595ff2c757b0f407750446f5719429018ad193da`
MD5	`8d3c68be5a1084afc097a741499d932e`
BLAKE2b-256	`ae45e1b94e64971242e0f500513c082ca1525cd31a176800b756ca9ad0600e29`

See more details on using hashes here.

datus-redshift 0.1.6rc1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

datus-redshift

Installation

Usage

IAM Authentication

Query Result Formats

Configuration Options

Features

Testing

Quick Start

Integration Tests (Requires Redshift Cluster)

TPC-H Test Data

Initialize TPC-H Data Manually

Test Statistics

Test Markers

Development

Setup

Code Structure

Code Quality

Redshift SQL Notes

Troubleshooting

Connection Issues

Query Performance

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes