Skip to main content

SDK for DataChecks

Project description

DCS SDK v1.8.4

SDK for DataChecks

Installation

Python version >=3.10,<3.13

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
MySQL mysql
SAP Sybase IQ/ASE sybase
File file
BigQuery bigquery

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs-sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs-sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs-sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs-sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html
--table No False Display Comparison in table format dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table

Example Command [CLI]

$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data

File Comparisons

dcs-sdk supports file-backed comparisons through DuckDB for:

  • .csv
  • .parquet
  • mixed-format comparisons such as csv ↔ parquet

Supported file datasource types:

  • file
  • azure_blob

Notes:

  • File paths must point to concrete .csv or .parquet files or globs.
  • Query-backed file comparisons are supported. When source.query or target.query is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

Local File Example

data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Run it with:

dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats

Databricks Query-Backed Comparisons

Databricks comparisons can use either:

  • a table name
  • a SQL query

For Parquet files stored on Databricks, use a query with read_files(...).

Databricks Table vs Parquet Example

data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Notes:

  • Query-backed Databricks comparisons require temporary_schema.
  • Generated temp views/tables use the datachecks_ prefix.
  • Prefer Unity Catalog volume paths such as /Volumes/... for Databricks file queries.
  • Legacy DBFS root paths such as dbfs:/raw/... are not the recommended path for this flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-1.8.4.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_sdk-1.8.4-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-1.8.4.tar.gz.

File metadata

  • Download URL: dcs_sdk-1.8.4.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.8.4.tar.gz
Algorithm Hash digest
SHA256 4a3e78d986098a3cccff1c8de1162ff0cd64ec4d51636fd012063c5b23baa24f
MD5 e001f114352dcc0d12a58efd99194365
BLAKE2b-256 6cf057f410e27013bc2a3d0b9986b414ad291404dbd1a7d12eeb082225d2e362

See more details on using hashes here.

File details

Details for the file dcs_sdk-1.8.4-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-1.8.4-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.8.4-py3-none-any.whl
Algorithm Hash digest
SHA256 165e029fce1b6210c7960fc8212433cbcfff19863ea633aa06ceb899d6e2034d
MD5 6ed393c3159311965dc732a2f1869b2f
BLAKE2b-256 dd6b5730a86b9444c69a6b403bbc2471d51607888df99b1b7663959cef9709be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page