Skip to main content

SDK for DataChecks

Project description

DCS SDK v0.3.1

SDK for DataChecks

Installation

Python version >=3.10,<3.12

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
File file

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs_sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs_sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs_sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs_sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs_sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs_sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs_sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs_sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html

Example Command [CLI]

$ dcs_sdk --version

$ dcs_sdk --help

$ dcs_sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --url=https://comapre/send/data

Example Configuration

data_sources:
  - name: iris_snowflake
    type: snowflake
    id: f533c099-196f-48da-b231-1d4c380f84bf
    workspace: default
    connection:
      account: bp54281.central-india.azure
      username: !ENV ${SNOWFLAKE_USER}
      password: !ENV ${SNOWFLAKE_PASS}
      database: TEST_DCS
      schema: PUBLIC
      warehouse: compute_wh
      role: accountadmin

  - name: pgsql_azure
    type: postgres
    id: 4679b79a-7174-48fd-9c71-81cf806ef617
    workspace: default
    connection:
      host: !ENV ${POSTGRES_HOST_ONE}
      port: !ENV ${POSTGRES_PORT_ONE}
      username: !ENV ${POSTGRES_USER_ONE}
      password: !ENV ${POSTGRES_PASSWORD_ONE}
      database: !ENV ${POSTGRES_DB_ONE}

  - name: trino_test
    type: trino
    id: 9d86df86-6802-4551-a1ce-b98cdf3ec15f
    workspace: default
    connection:
      host: localhost
      port: 8080
      username: admin
      catalog: tpch
      schema: sf100

  - name: file_source_raw
    id: b5a76a0a-1b8f-4222-a31d-a31740f23168
    workspace: default
    type: file
    file_path: "nk.kyc_data/RAW_EMPLOYEE.csv"

  - name: file_source_tl
    id: 52c1f3c7-fd1e-4f3c-aed3-b01d8e1cfa4d
    workspace: default
    type: file
    file_path: "nk.kyc_data/TL_EMPLOYEE.csv"

  - name: databricks_test
    type: databricks
    id: 6f1fd8d6-5a59-4ba5-be37-aec044b000e7
    workspace: default
    connection:
      host: !ENV ${DATABRICKS_HOST}
      port: !ENV ${DATABRICKS_PORT}
      catalog: hive_metastore
      schema: default
      access_token: !ENV ${DATABRICKS_ACCESS_TOKEN}
      http_path: !ENV ${DATABRICKS_HTTP_PATH}

comparisons:
  # DB TO DB (SNOWFLAKE)
  comparison_one:
    source:
      data_source: iris_snowflake
      table: RAW_EMPLOYEE

    target:
      data_source: iris_snowflake
      table: TL_EMPLOYEE
    key_columns:
      - CUSTID
    columns:
      - FIRSTNAME
      - LASTNAME
      - DESIGNATION
      - SALARY

  # DB TO DB (Postgres Azure)
  comparison_two:
    source:
      data_source: pgsql_azure
      table: actor
    target:
      data_source: pgsql_azure
      table: actor2
    key_columns:
      - actor_id
    columns:
      - first_name
      - last_name
      - last_update
    columns_mappings:
      - source_column: actor_id
        target_column: actor_id1
      - source_column: first_name
        target_column: first_name1
      - source_column: last_name
        target_column: last_name1
      - source_column: last_update
        target_column: last_update1

  # FILE TO FILE
  comparison_three:
    source:
      data_source: file_source_raw
      table: RAW_EMPLOYEE

    target:
      data_source: file_source_tl
      table: TL_EMPLOYEE
    key_columns:
      - custid
    columns:
      - FIRSTNAME
      - lastname
      - designation
      - salary
    columns_mappings:
      - source_column: FIRSTNAME
        target_column: firstname

  # DB TO DB (Trino)
  comparison_trino:
    source:
      data_source: trino_test
      table: nation
    target:
      data_source: trino_test
      table: region
    key_columns:
      - regionkey
    columns:
      - name

  # DB TO DB (Databricks)
  comparison_databricks:
    source:
      data_source: databricks_test
      table: RAW_EMPLOYEE

    target:
      data_source: databricks_test
      table: TL_EMPLOYEE
    key_columns:
      - custid
    columns:
      - FIRSTNAME
      - lastname
      - designation
      - salary
    columns_mappings:
      - source_column: FIRSTNAME
        target_column: firstname

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-0.3.3.tar.gz (93.7 kB view details)

Uploaded Source

Built Distribution

dcs_sdk-0.3.3-py3-none-any.whl (140.5 kB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-0.3.3.tar.gz.

File metadata

  • Download URL: dcs_sdk-0.3.3.tar.gz
  • Upload date:
  • Size: 93.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.11 Darwin/23.3.0

File hashes

Hashes for dcs_sdk-0.3.3.tar.gz
Algorithm Hash digest
SHA256 b46b47b6fcc23d2ace3e4f24bd41ee0dcb4b6bd6342f19af8ed0c02cc4eca310
MD5 9094320feb3045b679856d75934986f4
BLAKE2b-256 35231ad5275065c53151d6786a0e38f1b2ecd3d659c80c023884863281eafd04

See more details on using hashes here.

File details

Details for the file dcs_sdk-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 140.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.11 Darwin/23.3.0

File hashes

Hashes for dcs_sdk-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 517814936ac8797cb85b50de836dc125b68537246b8d18b4b826bd699621fcff
MD5 ccb64c5f2fcecc1a735bef4156c6687d
BLAKE2b-256 1d3cb09f24cbadebdaea437c23efd616be26e1b50b59f2f7025cad1f7fe7cee1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page