SDK for DataChecks
Project description
DCS SDK v1.8.4
SDK for DataChecks
Installation
Python version
>=3.10,<3.13
$ pip install dcs-sdk[all-dbs]
Supported Databases
Availability Status
| Database | Code Name | Supported |
|---|---|---|
| PostgreSQL | postgres |
✅ |
| Snowflake | snowflake |
✅ |
| Trino | trino |
✅ |
| Databricks | databricks |
✅ |
| Oracle | oracle |
✅ |
| MSSQL | mssql |
✅ |
| MySQL | mysql |
✅ |
| SAP Sybase IQ/ASE | sybase |
✅ |
| File | file |
✅ |
| BigQuery | bigquery |
✅ |
Available Commands
| Option | Short Option | Required | Default | Description | Example |
|---|---|---|---|---|---|
| --config-path | -C | Yes | None | Specify the file path for the configuration | dcs-sdk run --config-path config.yaml --compare comp_name |
| --compare | Yes | None | Run only specific comparison using comparison name | dcs-sdk run --config-path config.yaml --compare comp_name | |
| --save-json | -j | No | False | Save the data into a JSON file | dcs-sdk run --config-path config.yaml --compare comp_name --save-json |
| --json-path | -jp | No | dcs_report.json | Specify the file path for JSON file | dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json |
| --stats | No | False | Print stats about data diff | dcs-sdk run --config-path config.yaml --compare comp_name --stats | |
| --url | No | None | Specify url to send data to server | dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data | |
| --html-report | No | False | Save table as HTML | dcs-sdk run --config-path config.yaml --compare comp_name --html-report | |
| --report-path | No | dcs_report.html | Specify the file path for HTML report | dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html | |
| --table | No | False | Display Comparison in table format | dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table |
Example Command [CLI]
$ dcs-sdk --version
$ dcs-sdk --help
$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data
File Comparisons
dcs-sdk supports file-backed comparisons through DuckDB for:
.csv.parquet- mixed-format comparisons such as
csv ↔ parquet
Supported file datasource types:
fileazure_blob
Notes:
- File paths must point to concrete
.csvor.parquetfiles or globs. - Query-backed file comparisons are supported. When
source.queryortarget.queryis provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.
Local File Example
data_sources:
- name: source_file
type: file
file_path: sample_data/parquet/one_source.parquet
- name: target_file
type: file
file_path: sample_data/parquet/two_target.parquet
comparisons:
parquet_file_diff:
source:
data_source: source_file
table: one_source
target:
data_source: target_file
table: two_target
key_columns: [id]
columns: [customer_name, status, amount, region]
Run it with:
dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats
Databricks Query-Backed Comparisons
Databricks comparisons can use either:
- a table name
- a SQL query
For Parquet files stored on Databricks, use a query with read_files(...).
Databricks Table vs Parquet Example
data_sources:
- name: databricks_demo
type: databricks
connection:
host: your-workspace.cloud.databricks.com
port: 443
http_path: /sql/1.0/warehouses/your-warehouse
access_token: ${DATABRICKS_TOKEN}
catalog: dcs_demo_databricks
schema: source
temporary_schema: temp_schema
comparisons:
databricks_table_vs_parquet:
source:
data_source: databricks_demo
table: source_table
target:
data_source: databricks_demo
query: |
SELECT *
FROM read_files(
'/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
format => 'parquet'
)
view_name: datachecks_target_file
materialization_type: table
key_columns: [id]
columns: [customer_name, status, amount, region]
Notes:
- Query-backed Databricks comparisons require
temporary_schema. - Generated temp views/tables use the
datachecks_prefix. - Prefer Unity Catalog volume paths such as
/Volumes/...for Databricks file queries. - Legacy DBFS root paths such as
dbfs:/raw/...are not the recommended path for this flow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dcs_sdk-1.8.4.tar.gz.
File metadata
- Download URL: dcs_sdk-1.8.4.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a3e78d986098a3cccff1c8de1162ff0cd64ec4d51636fd012063c5b23baa24f
|
|
| MD5 |
e001f114352dcc0d12a58efd99194365
|
|
| BLAKE2b-256 |
6cf057f410e27013bc2a3d0b9986b414ad291404dbd1a7d12eeb082225d2e362
|
File details
Details for the file dcs_sdk-1.8.4-py3-none-any.whl.
File metadata
- Download URL: dcs_sdk-1.8.4-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
165e029fce1b6210c7960fc8212433cbcfff19863ea633aa06ceb899d6e2034d
|
|
| MD5 |
6ed393c3159311965dc732a2f1869b2f
|
|
| BLAKE2b-256 |
dd6b5730a86b9444c69a6b403bbc2471d51607888df99b1b7663959cef9709be
|