Skip to main content

A testing and auditing tool inspired by dbt, for those not using dbt.

Project description

SchemaQL

A testing and auditing tool inspired by dbt, for those not using dbt.

Installation

  1. Fork and clone to repo to a local folder.
  2. Create a new Python virtual environment in this folder and activate it.
  3. In the local repo folder, run pip install -r requirements to install the dependent packages in this virtualenv.
  4. In the local repo folder, run pip install -e . to install a dev version locally.

You should now be able to run schemaql -h from the command line.

schemaql -h
usage: schemaql [-h] [-p None] [-c config.yml] [-x connections.yml] [action]

positional arguments:
  action                Action ('test', or 'generate')

optional arguments:
  -h, --help            show this help message and exit
  -p None, --project None
                        Project
  -c config.yml, --config-file config.yml
                        Config file
  -x connections.yml, --connections-file connections.yml
                        Connections file

Configuration

SchemaQL needs to *.yml for configuration:

In connections.yml you define how to connect to one or more of your data warehouse connections like so:

project-1-snowflake:
type: snowflake
  account: <my_account>
  user: <my_user>
  password: <my_password>
  database: <my_database>
  # optional:
  warehouse: <my_warehouse>
  role: <my_role>
  schema: <my_initial_schema>

project-2-bigquery:
type: bigquery
  database: <my_bigquery_project>
  credentials_path: <path_to_my_service_account_credentials.json>
  supports_multi_insert: False

In config.yml, you define the following:

  • Logging, where output is a directory relative to the project path.
logs:
  output: logs
  • a collector which is a definition for how you want test results to be collected. Currently supported are:
    • json
    • csv
    • database

For json and csv, you only need to provide an output path.

collector:
  type: csv
  output: output 

For database collection, you need to provide the name of a connection (from connections.yml) and a destination table via output:

collector:
  type: database
  connection: project-1-snowflake
  output: test_results

The collector connection does not need to the same connection, or even the same connection type as the project connection. So, tests could be run against BigQuery but tests results could be collected using Snowflake or Postgres.

  • projects which is a combination of a connection and a list of which databases and schemas you want to work with. If you don't define any schemas within the database key, all schemas will be processed.
projects:
  project-1:
    connection: project-1-snowflake
    schema:
      database_1:
      - schema_1
      - schema_2
  project-2:
    connection: project-2-bigquery
    schema:
      my_bg_project_1
      my_bg_project_2
      - data_set_1
      - data_set_2

Usage

Generate

SchemaQL runs tests against schema information contained in yml files. You can either write these from scratch, use your existing dbt schema files, or use schemaql to generate them.

Generates schema files for all projects:

schemaql generate

Generates schema files for "my_project" only:

schemaql generate -p my_project

Test

schemaql test

Test "my_project" only:

schemaql test -p my_project

Built-in Tests

Schema Tests

not_null

Checks if column values are NULL

models:
  - name: my_table
    columns:
      - name: col_1
        tests:
          - not_null

relationships

Checks if column values match values from column in other entity

models:
  - name: my_table
    columns:
      - name: col_1
        tests:
          - relationships:
              to: my_other_table
              field: col_1

unique

Checks if column values are unique

models:
  - name: my_table
    columns:
      - name: col_1
        tests:
          - unique

Data Tests

accepted_values

Checks if column values match a predefined list of accepted values

models:
  - name: my_table
    columns:
      - name: day_of_week
        description: 
        tests:
          - accepted_values:
            values: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

at_least_one

Checks if column has at least one value

models:
  - name: my_table
    columns:
      - name: col_1
        tests:
          - at_least_one

equal_expression (TBD)

frequency (TBD)

not_constant

Checks if column has at more than one value

models:
  - name: my_table
    columns:
      - name: col_1
        tests:
          - not_constant

recency (TBD)

unique_rows

Checks if table rows are unique. If columns are not specified, uses all columns.

models:
  - name: my_table
    tests:
      - unique_rows:
          columns: [col_1, col_2, col_3]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schemaql-0.0.4.tar.gz (16.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

schemaql-0.0.4-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

schemaql-0.0.4-py2.py3-none-any.whl (37.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file schemaql-0.0.4.tar.gz.

File metadata

  • Download URL: schemaql-0.0.4.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.4

File hashes

Hashes for schemaql-0.0.4.tar.gz
Algorithm Hash digest
SHA256 93e196831c0ed241db65145d81d2054815331adbd16d6f73fece8ca8c3f0484a
MD5 0e3cd6de97d4370e4881df5393469e4c
BLAKE2b-256 d558622ae59a8e82ba50a74700de0ecf56c291be38a7870f31c8c8ac918040a5

See more details on using hashes here.

File details

Details for the file schemaql-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: schemaql-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.4

File hashes

Hashes for schemaql-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2ca8df84c3b4a442f1cf98c7b837b32767850c94e4b722e01ede7110e5de8e2d
MD5 80b1ccb338347a7ce9998edb423bead0
BLAKE2b-256 f9b044b93aba8e0a14991d9338988149089605588ff7143b3e72323fd328d13a

See more details on using hashes here.

File details

Details for the file schemaql-0.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: schemaql-0.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.4

File hashes

Hashes for schemaql-0.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e98ca07ddb9105bbea83a7dfe933fc26b2434e11cd4eb3ba69f4de45cd723655
MD5 35ce066d7c23aab86aad502316758844
BLAKE2b-256 962172ea27bcf09bdd04b043f2ecaac82e00b671251c6e270b216877a50fbae5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page