Skip to main content

SDK for DataChecks

Project description

DCS SDK v1.8.3

SDK for DataChecks

Installation

Python version >=3.10,<3.13

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
MySQL mysql
SAP Sybase IQ/ASE sybase
File file
BigQuery bigquery

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs-sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs-sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs-sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs-sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html
--table No False Display Comparison in table format dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table

Example Command [CLI]

$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data

File Comparisons

dcs-sdk supports file-backed comparisons through DuckDB for:

  • .csv
  • .parquet
  • mixed-format comparisons such as csv ↔ parquet

Supported file datasource types:

  • file
  • azure_blob

Notes:

  • File paths must point to concrete .csv or .parquet files or globs.
  • Query-backed file comparisons are supported. When source.query or target.query is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

Local File Example

data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Run it with:

dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats

Databricks Query-Backed Comparisons

Databricks comparisons can use either:

  • a table name
  • a SQL query

For Parquet files stored on Databricks, use a query with read_files(...).

Databricks Table vs Parquet Example

data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Notes:

  • Query-backed Databricks comparisons require temporary_schema.
  • Generated temp views/tables use the datachecks_ prefix.
  • Prefer Unity Catalog volume paths such as /Volumes/... for Databricks file queries.
  • Legacy DBFS root paths such as dbfs:/raw/... are not the recommended path for this flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-1.8.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_sdk-1.8.3-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-1.8.3.tar.gz.

File metadata

  • Download URL: dcs_sdk-1.8.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1008-azure

File hashes

Hashes for dcs_sdk-1.8.3.tar.gz
Algorithm Hash digest
SHA256 0832a1314c1fa55783d6b918a4eab87adfd56bdcc8d08f29d5439e41fa79c123
MD5 8f5ab758c8acc9e14bcdc1b99bb52242
BLAKE2b-256 0ce5d9a6bfb6f0170a96aac05fbc1fd9564ad17a4caa9a332292f85056381089

See more details on using hashes here.

File details

Details for the file dcs_sdk-1.8.3-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-1.8.3-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1008-azure

File hashes

Hashes for dcs_sdk-1.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 29ca5cd1d32c728d942e1a0598d01c70294f4e56d061ca03f16b0531cc66298b
MD5 06d631a060cb272b18f271d6822b58fe
BLAKE2b-256 8a92737f4b76eaddcc0c6627d9319a6a308a5cd056ee7beb0933e6b347b8a73e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page