SDK for DataChecks
Project description
DCS CLI v0.3.0
SDK for DataChecks
Installation
Python version
>=3.10,<3.12
$ pip install dcs-cli[all-dbs]
Supported Databases
Availability Status
Database | Code Name | Supported |
---|---|---|
PostgreSQL | postgres |
✅ |
Snowflake | snowflake |
✅ |
Trino | trino |
✅ |
Databricks | databricks |
✅ |
File | file |
✅ |
Available Commands
Option | Short Option | Required | Default | Description | Example |
---|---|---|---|---|---|
--config-path | -C | Yes | None | Specify the file path for the configuration | dcs_cli run --config-path config.yaml --compare comp_name |
--compare | Yes | None | Run only specific comparison using comparison name | dcs_cli run --config-path config.yaml --compare comp_name | |
--save-json | -j | No | False | Save the data into a JSON file | dcs_cli run --config-path config.yaml --compare comp_name --save-json |
--json-path | -jp | No | dcs_report.json | Specify the file path for JSON file | dcs_cli run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json |
--stats | No | False | Print stats about data diff | dcs_cli run --config-path config.yaml --compare comp_name --stats | |
--url | No | None | Specify url to send data to server | dcs_cli run --config-path config.yaml --compare comp_name --url https://comapre/send/data | |
--html-report | No | False | Save table as HTML | dcs_cli run --config-path config.yaml --compare comp_name --html-report | |
--report-path | No | dcs_report.html | Specify the file path for HTML report | dcs_cli run --config-path config.yaml --compare comp_name --html-report --report-path table.html |
Example Command [CLI]
$ dcs_cli --version
$ dcs_cli --help
$ dcs_cli run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --url https://comapre/send/data
Example Configuration
data_sources:
- name: iris_snowflake
type: snowflake
id: f533c099-196f-48da-b231-1d4c380f84bf
workspace: default
connection:
account: bp54281.central-india.azure
username: !ENV ${SNOWFLAKE_USER}
password: !ENV ${SNOWFLAKE_PASS}
database: TEST_DCS
schema: PUBLIC
warehouse: compute_wh
role: accountadmin
- name: pgsql_azure
type: postgres
id: 4679b79a-7174-48fd-9c71-81cf806ef617
workspace: default
connection:
host: !ENV ${POSTGRES_HOST_ONE}
port: !ENV ${POSTGRES_PORT_ONE}
username: !ENV ${POSTGRES_USER_ONE}
password: !ENV ${POSTGRES_PASSWORD_ONE}
database: !ENV ${POSTGRES_DB_ONE}
- name: trino_test
type: trino
id: 9d86df86-6802-4551-a1ce-b98cdf3ec15f
workspace: default
connection:
host: localhost
port: 8080
username: admin
catalog: tpch
schema: sf100
- name: file_source_raw
id: b5a76a0a-1b8f-4222-a31d-a31740f23168
workspace: default
type: file
file_path: "nk.kyc_data/RAW_EMPLOYEE.csv"
- name: file_source_tl
id: 52c1f3c7-fd1e-4f3c-aed3-b01d8e1cfa4d
workspace: default
type: file
file_path: "TL_EMPLOYEE.csv"
- name: databricks_test
type: databricks
id: 6f1fd8d6-5a59-4ba5-be37-aec044b000e7
workspace: default
connection:
host: !ENV ${DATABRICKS_HOST}
port: !ENV ${DATABRICKS_PORT}
catalog: hive_metastore
schema: default
access_token: !ENV ${DATABRICKS_ACCESS_TOKEN}
http_path: !ENV ${DATABRICKS_HTTP_PATH}
comparisons:
# DB TO DB (SNOWFLAKE)
comparison_one:
source:
data_source: iris_snowflake
table: RAW_EMPLOYEE
target:
data_source: iris_snowflake
table: TL_EMPLOYEE
key_columns:
- CUSTID
columns:
- FIRSTNAME
- LASTNAME
- DESIGNATION
- SALARY
# DB TO DB (Postgres Azure)
comparison_two:
source:
data_source: pgsql_azure
table: actor
target:
data_source: pgsql_azure
table: actor2
key_columns:
- actor_id
columns:
- first_name
- last_name
- last_update
columns_mappings:
- source_column: actor_id
target_column: actor_id1
- source_column: first_name
target_column: first_name1
- source_column: last_name
target_column: last_name1
- source_column: last_update
target_column: last_update1
# FILE TO FILE
comparison_three:
source:
data_source: file_source_raw
table: RAW_EMPLOYEE
target:
data_source: file_source_tl
table: TL_EMPLOYEE
key_columns:
- custid
columns:
- FIRSTNAME
- lastname
- designation
- salary
columns_mappings:
- source_column: FIRSTNAME
target_column: firstname
# DB TO DB (Trino)
comparison_trino:
source:
data_source: trino_test
table: nation
target:
data_source: trino_test
table: region
key_columns:
- regionkey
columns:
- name
# DB TO DB (Databricks)
comparison_databricks:
source:
data_source: databricks_test
table: RAW_EMPLOYEE
target:
data_source: databricks_test
table: TL_EMPLOYEE
key_columns:
- custid
columns:
- FIRSTNAME
- lastname
- designation
- salary
columns_mappings:
- source_column: FIRSTNAME
target_column: firstname
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dcs_sdk-0.3.0.tar.gz
(93.6 kB
view details)
Built Distribution
dcs_sdk-0.3.0-py3-none-any.whl
(140.2 kB
view details)
File details
Details for the file dcs_sdk-0.3.0.tar.gz
.
File metadata
- Download URL: dcs_sdk-0.3.0.tar.gz
- Upload date:
- Size: 93.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.9.11 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 439b5322946f3aa464f4816fe7cc76a0a8bd2bab26873c783c15cbd22598df69 |
|
MD5 | 4452a484febe523851b1a8d202175f2b |
|
BLAKE2b-256 | 8b64d5f80cc70b74f3eac2bb2f5d0225e78bd31b1100a50acfbc1cd1c77125dc |
File details
Details for the file dcs_sdk-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: dcs_sdk-0.3.0-py3-none-any.whl
- Upload date:
- Size: 140.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.9.11 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85349893d332b9c8ae029f43c6493ca213067139181f1c8bfa047fd71c4701bb |
|
MD5 | 5c1e9e9fc13048a8d5141ec0318c17ae |
|
BLAKE2b-256 | 515d80d1b1b604d3733e4dbe5eed4160e3eb38705e06596332d9ca80a3700495 |