Amazon Redshift adapter for Datus Agent
Project description
Datus Redshift Adapter
Amazon Redshift adapter for Datus Agent, enabling seamless integration with Redshift data warehouses.
Features
- ✅ Full support for Redshift databases, schemas, tables, views, and materialized views
- ✅ Efficient metadata retrieval using PostgreSQL system catalogs
- ✅ Multiple authentication methods (username/password and IAM)
- ✅ SSL/TLS secure connections
- ✅ Arrow-based query execution for high performance
- ✅ Comprehensive error handling and logging
Installation
pip install datus-redshift
Or install from source:
cd datus-redshift
pip install -e .
Usage
Basic Connection (Username/Password)
from datus_redshift import RedshiftConnector, RedshiftConfig
# Create configuration
config = RedshiftConfig(
host="my-cluster.abc123.us-west-2.redshift.amazonaws.com",
username="admin",
password="MySecurePassword123",
database="my_database",
schema="public",
port=5439,
ssl=True
)
# Create connector instance
connector = RedshiftConnector(config)
# Test connection
result = connector.test_connection()
print(result)
# Execute a query
result = connector.execute_query("SELECT * FROM my_table LIMIT 10")
print(result.sql_return)
# Close connection when done
connector.close()
IAM Authentication
config = RedshiftConfig(
host="my-cluster.abc123.us-west-2.redshift.amazonaws.com",
username="iam_user",
database="my_database",
iam=True,
cluster_identifier="my-cluster",
region="us-west-2",
access_key_id="YOUR_ACCESS_KEY",
secret_access_key="YOUR_SECRET_KEY"
)
connector = RedshiftConnector(config)
Using with Datus Agent
Once installed, the Redshift adapter is automatically available in Datus Agent:
from datus.tools.db_tools import get_connector
# Datus will automatically use RedshiftConnector for redshift connections
connector = get_connector(
dialect="redshift",
config={
"host": "my-cluster.abc123.us-west-2.redshift.amazonaws.com",
"username": "admin",
"password": "MySecurePassword123",
"database": "my_database"
}
)
Common Operations
List Databases
databases = connector.get_databases()
print(f"Available databases: {databases}")
List Schemas
schemas = connector.get_schemas()
print(f"Available schemas: {schemas}")
List Tables
tables = connector.get_tables(schema_name="public")
print(f"Tables in public schema: {tables}")
Get Table Schema
schema_info = connector.get_schema(
schema_name="public",
table_name="my_table"
)
print(f"Table columns: {schema_info}")
Execute Queries with Different Formats
# Get results as CSV
result = connector.execute_query("SELECT * FROM my_table", result_format="csv")
print(result.sql_return)
# Get results as pandas DataFrame
result = connector.execute_query("SELECT * FROM my_table", result_format="pandas")
df = result.sql_return
# Get results as Arrow table (best for large datasets)
result = connector.execute_query("SELECT * FROM my_table", result_format="arrow")
arrow_table = result.sql_return
# Get results as list of dictionaries
result = connector.execute_query("SELECT * FROM my_table", result_format="list")
rows = result.sql_return
Get Sample Data
# Get sample rows from all tables in a schema
samples = connector.get_sample_rows(
schema_name="public",
top_n=5
)
# Get sample rows from specific tables
samples = connector.get_sample_rows(
schema_name="public",
tables=["table1", "table2"],
top_n=10
)
Working with Views and Materialized Views
# List views
views = connector.get_views(schema_name="public")
# List materialized views
mvs = connector.get_materialized_views(schema_name="public")
# Get view definitions with DDL
views_with_ddl = connector.get_views_with_ddl(schema_name="public")
for view in views_with_ddl:
print(f"View: {view['table_name']}")
print(f"DDL: {view['definition']}")
Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
host |
str | Required | Redshift cluster endpoint |
username |
str | Required | Username for authentication |
password |
str | Required | Password for authentication |
port |
int | 5439 | Redshift port |
database |
str | None | Default database to connect to |
schema |
str | None | Default schema (uses 'public' if not specified) |
timeout_seconds |
int | 30 | Connection timeout in seconds |
ssl |
bool | True | Enable SSL/TLS connection |
iam |
bool | False | Use IAM authentication |
cluster_identifier |
str | None | Cluster ID for IAM auth |
region |
str | None | AWS region for IAM auth |
access_key_id |
str | None | AWS access key for IAM auth |
secret_access_key |
str | None | AWS secret key for IAM auth |
Requirements
- Python >= 3.8
- datus-agent >= 0.2.1
- redshift_connector >= 2.0.0
- pyarrow (installed with datus-agent)
- pandas (installed with datus-agent)
Development
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
Code Structure
datus-redshift/
├── datus_redshift/
│ ├── __init__.py # Module initialization and registration
│ ├── config.py # Configuration class (RedshiftConfig)
│ └── connector.py # Main connector implementation (RedshiftConnector)
├── tests/
│ ├── __init__.py
│ └── test_connector.py # Unit tests
├── pyproject.toml # Package configuration
└── README.md # This file
Troubleshooting
Connection Issues
- Timeout errors: Increase
timeout_secondsin the configuration - SSL errors: Try setting
ssl=Falseif your cluster doesn't require SSL - IAM auth fails: Verify your AWS credentials and cluster identifier are correct
Query Performance
- Use
result_format="arrow"for large result sets (most efficient) - Always specify schema names to avoid scanning all schemas
- Use LIMIT clauses for exploratory queries
Contributing
Contributions are welcome! Please see the main Datus-adapters repository for contribution guidelines.
License
Apache License 2.0 - See LICENSE file for details.
Support
- GitHub Issues: Report bugs or request features
- Documentation: Datus Agent Docs
- Slack Community: Join Datus Slack
Related Projects
- datus-snowflake - Snowflake adapter
- datus-mysql - MySQL adapter
- datus-starrocks - StarRocks adapter
- Datus Agent - Main Datus framework
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datus_redshift-0.1.0.tar.gz.
File metadata
- Download URL: datus_redshift-0.1.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8666198aa4545bc2fb31ebbbecdd90caea1ffacec08a1373f08b887e359f41a6
|
|
| MD5 |
a31b904a127ae2cf01f3ff137291dba9
|
|
| BLAKE2b-256 |
fd244af761d5b66adf757c5f465c8c5d25ba830d491960d620263b5073b90dc2
|
File details
Details for the file datus_redshift-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datus_redshift-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9e4393b22e51dfb07e7ec1f252e668f5a4c37e1543e052e789bf6f2c03a1cc0
|
|
| MD5 |
a092bf71e91e283494c02dba138c4d2f
|
|
| BLAKE2b-256 |
d34ab4076b792d15a6c46fde224f74e7461494a5180beff73651243963019713
|