Skip to main content

Data exchange agent for migrations and validation

Project description

Snowflake Data Exchange Agent

Python

The Data Exchange Agent is the Worker component of the Cloud Data Migration solution. It connects to source databases (SQL Server, Amazon Redshift, Teradata, PostgreSQL), extracts data, and uploads it to Snowflake stages for ingestion by the Data Migration Orchestrator (snowflake-data-migration-orchestrator).

The same worker process also executes Cloud Data Validation tasks (data_validation) when the orchestrator schedules them. That path relies on the optional snowflake-data-validation package being installed in the worker environment (see the orchestrator documentation for creating validation workflows and JSON configuration).

Installation

pip install snowflake-data-exchange-agent

Python Version: 3.11 or higher

Usage

# Start with a configuration file
data-exchange-agent -c <configuration-file-path>

# Start with default configuration.toml in current directory
data-exchange-agent

# Custom port and parallelism
data-exchange-agent --max-parallel-tasks 8 --port 8080

# Custom base directory for exported files (overrides config)
data-exchange-agent --local-results-directory /mnt/dea-exports

# Debug mode
data-exchange-agent --debug --port 5001

Worker Configuration

The Worker configuration file uses TOML format.

Section Property Type Description
Top Level selected_task_source String Currently should always be set to "snowflake_stored_procedure".
[application] max_parallel_tasks Integer Maximum number of tasks the worker will process in parallel (using threads).
[application] task_fetch_interval Integer Interval (in seconds) between attempts to fetch new tasks from the Orchestrator.
[application] snowflake_database_for_metadata String Optional. Database where the orchestrator deployed the task queue (default SNOWCONVERT_AI). Must match the orchestrator's CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA if you override it there.
[application] snowflake_schema_for_data_migration_metadata String Optional. Schema for PULL_TASKS / COMPLETE_TASK / FAIL_TASK (default DATA_MIGRATION). Must match the orchestrator's CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA if overridden.
[application] local_results_directory String Optional. Base directory where each task's exported Parquet or CSV files are written before upload. Each run uses a subfolder task_<id>/<timestamp>. After a successful upload, that timestamp folder and the task_<id> parent (if empty) are removed so stale empty directories do not accumulate. When unset, files go under ~/.data_exchange_agent/result_data. Tilde (~) and relative paths are expanded at load time.
[connections.source.*] Object Configuration for source system connections. The Worker typically requires an ODBC driver. See examples below.
[connections.target.snowflake_connection_name] connection_name String The name of the connection entry in the ~/.snowflake/config.toml file to use.

When selected_task_source is snowflake_stored_procedure, the worker issues CALL statements against the task-queue using application.snowflake_database_for_metadata and application.snowflake_schema_for_data_migration_metadata. These settings are independent of Snowflake connection session defaults (SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA in the connection profile).

Example: SQL Server (Standard Authentication)

[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433

Example: Amazon Redshift (IAM Authentication)

[connections.source.redshift]
username = "demo-user"
database = "demo_db"
auth_method = "iam-provisioned-cluster"
cluster_id = "my-aws-cluster"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"

Example: Amazon Redshift (Standard Authentication)

[connections.source.redshift]
username = "myuser"
password = "mypassword"
database = "mydatabase"
host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
port = 5439
auth_method = "standard"

Example: PostgreSQL (ODBC)

Use a PostgreSQL ODBC driver installed on the worker machine (for example PostgreSQL Unicode). You can set odbc_driver explicitly or let the agent pick a default.

[connections.source.postgresql]
username = "my_user"
password = "my_password"
database = "my_database"
host = "postgres.example.com"
port = 5432
# odbc_driver = "PostgreSQL Unicode"

Example: Teradata

The agent supports two Teradata drivers and automatically selects the best one available:

  1. teradatasql (preferred) -- Pure Python driver. No OS-level ODBC installation required. Install with pip install teradatasql.
  2. ODBC fallback -- If teradatasql is not installed, the agent falls back to pyodbc with the Teradata ODBC driver. Set driver_name to the exact name returned by pyodbc.drivers().

When teradatasql is available, driver_name is ignored and no ODBC driver needs to be installed on the host. Use dbc_name when your Teradata COP / TDPID alias differs from host.

[connections.source.teradata]
host = "your-teradata-host.example.com"
port = 1025
database = "tpcds"
username = "your_username"
password = "your_password"
# driver_name = "Teradata Database ODBC Driver 17.20"  # only needed for ODBC fallback
# dbc_name = "TDPID_ALIAS"  # optional; defaults to host

Note: Only one source connection is needed. The Snowflake target connection should point to a valid entry in your ~/.snowflake/config.toml.

ODBC Driver Auto-Detection

The agent automatically detects the best available ODBC driver for SQL Server connections. If no odbc_driver is specified in the configuration, it will prefer the newest available driver (ODBC Driver 18 > 17 > 13 > 11). If a specific driver is requested but not found, it falls back to the best available driver with a warning.

To manually specify a driver:

[connections.source.sqlserver]
odbc_driver = "ODBC Driver 17 for SQL Server"

ODBC Encryption (SQL Server)

The encrypt and trust_server_certificate parameters are optional. By default, they are omitted from the connection string, allowing the ODBC driver to use its default behavior:

  • ODBC Driver 17 and below: Encryption is disabled by default.
  • ODBC Driver 18 and above: Encryption is mandatory by default.
[connections.source.sqlserver]
username = "sa"
password = "mypassword"
database = "mydb"
host = "my-server.example.com"
port = 1433
encrypt = true
trust_server_certificate = false

For development environments or SQL Servers without encryption support, either omit the encryption parameters or set encrypt = false.

Changelog

v1.10.0

Improvements

  • Result set snapshots validation improvements
  • Add hybrid row validation mode — two-phase MD5 + cell drilldown
  • [DMO][DEW][Cloud Data Validation] Improvements for Data Validation performance
  • [DMO][DEW] Include thread name/id in the logs
  • [DMO][DEW] Improve task queue to support higher number of parallel workers
  • Add DEFAULT normalization templates for various data types

Bug fixes

  • Fix Redshift E2E tests
  • Fix SQL compilation memory exhaustion by batching L2 metrics queries for wide tables
  • [DMO][DEW] Fix issue with incremental sync (watermark) on Redshift
  • [DMO][DEW][SDV] Fix usage of vectorized scanner

v1.9.2

Improvements

  • Log installed dependency versions and the Python runtime version at startup.

v1.9.1

Improvements

  • Cloud data validation tasks read query results in batches instead of loading full result sets into memory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake_data_exchange_agent-1.10.0.tar.gz (188.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowflake_data_exchange_agent-1.10.0-py3-none-any.whl (172.2 kB view details)

Uploaded Python 3

File details

Details for the file snowflake_data_exchange_agent-1.10.0.tar.gz.

File metadata

File hashes

Hashes for snowflake_data_exchange_agent-1.10.0.tar.gz
Algorithm Hash digest
SHA256 b3bdf0c90c5773cdd9c7281a528be7a92b38da68a35eee07e9649e5a42adf899
MD5 9d55ef02e664bba0523f8103bf324c88
BLAKE2b-256 6a14696b0db31afb006e7ac58e9b77b8758082ee1a0202e1223bf208bb9277b4

See more details on using hashes here.

File details

Details for the file snowflake_data_exchange_agent-1.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for snowflake_data_exchange_agent-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 347b56576a9d335ed6c26ded5837ee10cb31238e47c9c7af5e5fc7359e013f45
MD5 132c71dd49000b9bcd496d4b3de5a099
BLAKE2b-256 dcb82ccb7f4a5d0ef89e912114531e43d0c8013ac584562c3bbc5c4a165f17fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page