Production-ready SDK for running Alation data quality checks using Soda Core
Project description
Alation Data Quality SDK
A production-ready Python SDK for executing Alation data quality checks using Soda Core. Designed for seamless integration into data pipelines, CI/CD workflows, and Airflow DAGs with minimal configuration.
Features
- Simple Configuration: OAuth-based authentication with environment variables
- Production Ready: Comprehensive error handling, logging, and retry logic
- Pipeline Friendly: Built for Airflow and CI/CD integration with proper exit codes
- Automatic Setup: Fetches datasource credentials and check definitions from Alation
- Comprehensive Results: Detailed scan results with actionable recommendations
- Enterprise Grade: JWT token management with automatic refresh
Installation
pip install alation-data-quality-sdk
Prerequisites
Before using the SDK, ensure you have:
- Alation Instance Access: A running Alation instance with data quality monitoring enabled
- OAuth Credentials: Client ID and Secret configured in Alation (Settings > Authentication > OAuth Client Applications). Make sure you add an appropriate admin role for data quality access.
- Data Quality Monitor: At least one configured monitor in Alation with defined checks
- Supported Datasource: Connection to Snowflake, Redshift, Databricks, or BigQuery
Configuration
Required Environment Variables
export ALATION_HOST="https://your-instance.alationcloud.com"
export MONITOR_ID="123"
export ALATION_CLIENT_ID="your-client-id"
export ALATION_CLIENT_SECRET="your-client-secret"
export TENANT_ID="your-tenant-id"
How to Obtain Configuration Values
- ALATION_HOST: Your Alation instance URL (e.g.,
https://company.alationcloud.com) - MONITOR_ID: Found in Alation Data Quality Monitor page, in the URL (e.g.,
.../data_quality/monitor/123, the ID is123) - ALATION_CLIENT_ID & ALATION_CLIENT_SECRET: Generated in Alation Settings > Authentication > OAuth Client Applications
- TENANT_ID: Found in Alation under Help (
?Icon on top right) > About this instance
Optional Environment Variables
export ALATION_TIMEOUT="30" # Request timeout in seconds (default: 30)
export LOG_LEVEL="INFO" # Logging level: DEBUG, INFO, WARNING, ERROR (default: INFO)
Quick Start
Python API
from data_quality_sdk import DataQualityRunner
# Initialize and run checks
runner = DataQualityRunner()
result = runner.run_checks()
# Check results
if result['exit_code'] == 0:
print("✅ All quality checks passed!")
else:
print(f"❌ Quality checks failed: {result['summary']}")
for recommendation in result['recommendations']:
print(f" - {recommendation}")
Command Line Interface
# Run quality checks
alation-dq
# Perform health check
alation-dq --health-check
# Return only exit code (for pipelines)
alation-dq --exit-code-only
Supported Data Sources
The SDK currently supports the following data sources:
- Snowflake
- Amazon Redshift
- Databricks
- Google BigQuery
Additional datasource support may be available through custom configuration. Contact Alation support for details.
Integration Examples
Airflow Integration
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
def run_data_quality_checks(**context):
from data_quality_sdk import DataQualityRunner
runner = DataQualityRunner()
result = runner.run_checks()
if result['exit_code'] != 0:
raise Exception(f"Data quality checks failed: {result['summary']}")
return result
default_args = {
'owner': 'data-team',
'depends_on_past': False,
'start_date': datetime(2024, 1, 1),
'email_on_failure': True,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'data_quality_pipeline',
default_args=default_args,
description='Data Quality Pipeline',
schedule_interval='@daily',
catchup=False,
)
# Your ETL tasks here
extract_task = BashOperator(
task_id='extract_data',
bash_command='your-extract-script.sh',
dag=dag,
)
transform_task = BashOperator(
task_id='transform_data',
bash_command='your-transform-script.sh',
dag=dag,
)
# Data quality checks
quality_check_task = PythonOperator(
task_id='data_quality_checks',
python_callable=run_data_quality_checks,
env_vars={
'ALATION_HOST': '{{ var.value.ALATION_HOST }}',
'MONITOR_ID': '{{ var.value.MONITOR_ID }}',
'ALATION_CLIENT_ID': '{{ var.value.ALATION_CLIENT_ID }}',
'ALATION_CLIENT_SECRET': '{{ var.value.ALATION_CLIENT_SECRET }}',
'TENANT_ID': '{{ var.value.TENANT_ID }}',
},
dag=dag,
)
# Load task (only runs if quality checks pass)
load_task = BashOperator(
task_id='load_data',
bash_command='your-load-script.sh',
dag=dag,
)
# Set dependencies
extract_task >> transform_task >> quality_check_task >> load_task
CI/CD Integration
# .github/workflows/data-quality.yml
name: Data Quality Checks
on:
schedule:
- cron: "0 6 * * *" # Daily at 6 AM
workflow_dispatch:
jobs:
quality-checks:
runs-on: ubuntu-latest
steps:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
- name: Install SDK
run: |
pip install alation-data-quality-sdk
- name: Run data quality checks
env:
ALATION_HOST: ${{ secrets.ALATION_HOST }}
MONITOR_ID: ${{ secrets.MONITOR_ID }}
ALATION_CLIENT_ID: ${{ secrets.ALATION_CLIENT_ID }}
ALATION_CLIENT_SECRET: ${{ secrets.ALATION_CLIENT_SECRET }}
TENANT_ID: ${{ secrets.TENANT_ID }}
run: |
alation-dq --exit-code-only
How It Works
- Authenticate: Uses OAuth client credentials to obtain JWT tokens from Alation
- Fetch Checks: Retrieves check definitions and datasource information using the Monitor ID
- Get Credentials: Obtains datasource connection credentials via Alation's metadata API
- Generate Config: Converts protobuf configuration to Soda Core YAML format
- Execute Scan: Runs Soda Core scan with generated configuration and check definitions
- Report Results: Sends scan results back to Alation and provides detailed local results
Advanced Usage
Programmatic Configuration
from data_quality_sdk import SDKConfig, DataQualityRunner
config = SDKConfig(
alation_host="https://my-instance.alationcloud.com",
monitor_id="123",
client_id="your-client-id",
client_secret="your-client-secret",
tenant_id="your-tenant-id",
timeout=60,
log_level="DEBUG"
)
runner = DataQualityRunner(config)
result = runner.run_checks()
Health Check
Before running in production, verify your setup:
health = DataQualityRunner.health_check()
print(f"Status: {health['status']}")
for check, result in health['checks'].items():
print(f" {check}: {result}")
Understanding Results
The SDK returns detailed results with the following structure:
{
'exit_code': 0, # 0 = success, >0 = issues
'monitor_id': '123',
'summary': {
'total_checks': 10,
'passed': 8,
'failed': 2,
'warnings': 0,
'errors': 0
},
'failed_checks': [...], # Details of failed checks
'recommendations': [...], # Actionable recommendations
'execution_metadata': {...} # Runtime information
}
Exit Codes
0: Success - all checks passed1: Quality checks failed (data quality issues)2: Configuration or setup error3: Results upload failed4: Network connectivity error5: Unexpected error
Error Handling
The SDK provides comprehensive error handling with specific exception types:
AlationAPIError: Issues with Alation API callsDatasourceConfigError: Problems with datasource configurationSodaScanError: Soda Core execution failuresUnsupportedDatasourceError: Unsupported datasource typesNetworkError: Network connectivity issues
Logging
The SDK provides structured logging with configurable levels:
# Set log level via environment variable
export LOG_LEVEL="DEBUG"
# Or programmatically
from data_quality_sdk.utils.logging import setup_logging
logger = setup_logging("DEBUG")
Troubleshooting
Common Issues
-
"Invalid OAuth credentials"
- Verify ALATION_CLIENT_ID and ALATION_CLIENT_SECRET are correct
- Ensure OAuth application is active in Alation Settings
-
"Tenant ID not found"
- Verify TENANT_ID matches the value in Alation > Help > About this instance
- Check that your OAuth application has the correct tenant scope
-
"Monitor not found"
- Verify MONITOR_ID exists in Alation Data Quality Monitors
- Ensure your user has access to the specified monitor
-
"Unsupported datasource type"
- Check that your datasource is one of: Snowflake, Redshift, Databricks, BigQuery
- Contact Alation support for additional datasource support
-
Connection errors
- Verify ALATION_HOST is correct and accessible
- Check network connectivity to your Alation instance
- Ensure your OAuth credentials have not expired
Debug Mode
Enable debug logging to get detailed information:
export LOG_LEVEL="DEBUG"
alation-dq
Support
For issues and questions:
- Check the troubleshooting section above
- Enable debug logging with
LOG_LEVEL="DEBUG" - Contact your Alation Customer Success team
- Visit Alation Community forums at https://help.alation.com
For bug reports, please contact Alation Support with:
- SDK version (
pip show alation-data-quality-sdk) - Python version
- Error messages and logs
- Steps to reproduce
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alation_data_quality_sdk-1.0.6.tar.gz.
File metadata
- Download URL: alation_data_quality_sdk-1.0.6.tar.gz
- Upload date:
- Size: 532.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e3e777bcb88a79a43b7a4ece10ac737b7f93f8ad26517fd1accd661f4e7287f
|
|
| MD5 |
f9b43d86fda1cbaca8efcac2b4f9340e
|
|
| BLAKE2b-256 |
892b9c44782482fabf6bcfcfc37e9e4b92584e1727f8ef89f3263e2fcb923355
|
File details
Details for the file alation_data_quality_sdk-1.0.6-py3-none-any.whl.
File metadata
- Download URL: alation_data_quality_sdk-1.0.6-py3-none-any.whl
- Upload date:
- Size: 259.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b4ae4d2b7e0d78cc7cec8a4c51c1ec032f7f3a92690313ea7f00f855b5059e7
|
|
| MD5 |
42a8e56ec48786f7fdc2e00612b6b21d
|
|
| BLAKE2b-256 |
b179736cf72866733de3fb6f86eb8cb3254f16016eaed5a7a7fb755ee73ec689
|