Skip to main content

SDK for DataChecks

Project description

DCS SDK v1.8.8

SDK for DataChecks

Installation

Python version >=3.10,<3.13

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
MySQL mysql
SAP Sybase IQ/ASE sybase
SAP HANA sap_hana
File file
BigQuery bigquery

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs-sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs-sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs-sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs-sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html
--table No False Display Comparison in table format dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table

Example Command [CLI]

$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data

File Comparisons

dcs-sdk supports file-backed comparisons through DuckDB for:

  • .csv
  • .parquet
  • mixed-format comparisons such as csv ↔ parquet

Supported file datasource types:

  • file
  • azure_blob

Notes:

  • File paths must point to concrete .csv or .parquet files or globs.
  • Query-backed file comparisons are supported. When source.query or target.query is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

Local File Example

data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Run it with:

dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats

Databricks Query-Backed Comparisons

Databricks comparisons can use either:

  • a table name
  • a SQL query

For Parquet files stored on Databricks, use a query with read_files(...).

Databricks Table vs Parquet Example

data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Notes:

  • Query-backed Databricks comparisons require temporary_schema.
  • Generated temp views/tables use the datachecks_ prefix.
  • Prefer Unity Catalog volume paths such as /Volumes/... for Databricks file queries.
  • Legacy DBFS root paths such as dbfs:/raw/... are not the recommended path for this flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-1.8.8.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_sdk-1.8.8-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-1.8.8.tar.gz.

File metadata

  • Download URL: dcs_sdk-1.8.8.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.8.8.tar.gz
Algorithm Hash digest
SHA256 df8a8b56970c8e317c61ae893cb15bb8a00c0e78e281c6b2791a9ba568575943
MD5 2f082f950a5e42324f1c2313f289c40f
BLAKE2b-256 e2718128bb26c8a99b638f94f2fffcee727bc7fa833a245e01e7a8ffe3c58d1e

See more details on using hashes here.

File details

Details for the file dcs_sdk-1.8.8-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-1.8.8-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.8.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d77d101403ec02f61b1dcc31d7f499c0bfb5aad86b575e978ac532a59b326318
MD5 ecb6cd84362bcbc627f7856701326550
BLAKE2b-256 cd2e89b0d7304fb848dd00036c01dd227e58ede63b703e48e90a5245fd0e1304

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page