Skip to main content

SDK for DataChecks

Project description

DCS SDK v1.9.0

SDK for DataChecks

Installation

Python version >=3.10,<3.13

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
MySQL mysql
SAP Sybase IQ/ASE sybase
SAP HANA sap_hana
File file
BigQuery bigquery

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs-sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs-sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs-sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs-sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html
--table No False Display Comparison in table format dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table

Example Command [CLI]

$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data

File Comparisons

dcs-sdk supports file-backed comparisons through DuckDB for:

  • .csv
  • .parquet
  • mixed-format comparisons such as csv ↔ parquet

Supported file datasource types:

  • file
  • azure_blob

Notes:

  • File paths must point to concrete .csv or .parquet files or globs.
  • Query-backed file comparisons are supported. When source.query or target.query is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

Local File Example

data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Run it with:

dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats

Databricks Query-Backed Comparisons

Databricks comparisons can use either:

  • a table name
  • a SQL query

For Parquet files stored on Databricks, use a query with read_files(...).

Databricks Table vs Parquet Example

data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Notes:

  • Query-backed Databricks comparisons require temporary_schema.
  • Generated temp views/tables use the datachecks_ prefix.
  • Prefer Unity Catalog volume paths such as /Volumes/... for Databricks file queries.
  • Legacy DBFS root paths such as dbfs:/raw/... are not the recommended path for this flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-1.9.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_sdk-1.9.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-1.9.0.tar.gz.

File metadata

  • Download URL: dcs_sdk-1.9.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1011-azure

File hashes

Hashes for dcs_sdk-1.9.0.tar.gz
Algorithm Hash digest
SHA256 9430a807de307b746c826432b925131ffb2c3ea457048f361e0c60a31df06e21
MD5 9fce02ec017f2f542840b96bf45e265d
BLAKE2b-256 758ea542a5667375d96ec9741c326c9e17b43a13986de3901fed20cbbcff4444

See more details on using hashes here.

File details

Details for the file dcs_sdk-1.9.0-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-1.9.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1011-azure

File hashes

Hashes for dcs_sdk-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00bebbf852735a24d88b98d69911b8f572e10051ca8f386bf8d4843cd8c7b8d0
MD5 ab6b31e40632d2ffbb41447ede812aec
BLAKE2b-256 17251752c28c88b0ccec605f1e5d791ca2ad1d5e18621c4fcac9367644028f5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page