Skip to main content

SDK for DataChecks

Project description

DCS SDK v1.9.1

SDK for DataChecks

Installation

Python version >=3.10,<3.13

$ pip install dcs-sdk[all-dbs]

Supported Databases

Availability Status

Database Code Name Supported
PostgreSQL postgres
Snowflake snowflake
Trino trino
Databricks databricks
Oracle oracle
MSSQL mssql
MySQL mysql
SAP Sybase IQ/ASE sybase
SAP HANA sap_hana
File file
BigQuery bigquery

Available Commands

Option Short Option Required Default Description Example
--config-path -C Yes None Specify the file path for the configuration dcs-sdk run --config-path config.yaml --compare comp_name
--compare Yes None Run only specific comparison using comparison name dcs-sdk run --config-path config.yaml --compare comp_name
--save-json -j No False Save the data into a JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json
--json-path -jp No dcs_report.json Specify the file path for JSON file dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json
--stats No False Print stats about data diff dcs-sdk run --config-path config.yaml --compare comp_name --stats
--url No None Specify url to send data to server dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data
--html-report No False Save table as HTML dcs-sdk run --config-path config.yaml --compare comp_name --html-report
--report-path No dcs_report.html Specify the file path for HTML report dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html
--table No False Display Comparison in table format dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table

Example Command [CLI]

$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data

File Comparisons

dcs-sdk supports file-backed comparisons through DuckDB for:

  • .csv
  • .parquet
  • mixed-format comparisons such as csv ↔ parquet

Supported file datasource types:

  • file
  • azure_blob

Notes:

  • File paths must point to concrete .csv or .parquet files or globs.
  • Query-backed file comparisons are supported. When source.query or target.query is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

Local File Example

data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Run it with:

dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats

Databricks Query-Backed Comparisons

Databricks comparisons can use either:

  • a table name
  • a SQL query

For Parquet files stored on Databricks, use a query with read_files(...).

Databricks Table vs Parquet Example

data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]

Notes:

  • Query-backed Databricks comparisons require temporary_schema.
  • Generated temp views/tables use the datachecks_ prefix.
  • Prefer Unity Catalog volume paths such as /Volumes/... for Databricks file queries.
  • Legacy DBFS root paths such as dbfs:/raw/... are not the recommended path for this flow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_sdk-1.9.1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_sdk-1.9.1-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file dcs_sdk-1.9.1.tar.gz.

File metadata

  • Download URL: dcs_sdk-1.9.1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.9.1.tar.gz
Algorithm Hash digest
SHA256 ecd2866d7ddbf52958f933783399c339bed1eedcd62985f8435e0874f43d4ed4
MD5 8bb92d401201ac88ebbc792be8e19942
BLAKE2b-256 b6208d941312cd4b6e9f1232f2626841bd63e1634a242d80a79e1d5c4b5b3d53

See more details on using hashes here.

File details

Details for the file dcs_sdk-1.9.1-py3-none-any.whl.

File metadata

  • Download URL: dcs_sdk-1.9.1-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.13 Linux/6.17.0-1010-azure

File hashes

Hashes for dcs_sdk-1.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 622444c54a8a2d275174c54d31e48595c043953b1313a6e405cc15403d3a166a
MD5 c8a5a7d2935ef6e3fb9f024334ce2659
BLAKE2b-256 7af3c0c01c5b40db0ea3a3326734522a15ba627fa9364eb4691fa8755b2522fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page