Skip to main content

A tool to compare data from different sources.

Project description

A utility to compare tables, espacially useful to perform validations for migration projects.

Testing

CI Test Deployment Coverage status

Package

PyPI Latest Release PyPI Downloads

Meta

License Apache-2.0 Codestyle Black

Connection Profiles

Connection profiles is a yaml file that will store credentials and other details to connect to the databases/data sources.

It must be setup in profiles.yml file and it must be placed under $HOME/.tulona dierctory. Create a directory named .tulona under your home directory and place profiles.yml under it.

This is what a sample profiles.yml looks like:

integration_project: # project_name
  profiles:
    pgdb:
      type: postgres
      host: localhost
      port: 5432
      database: postgres
      username: postgres
      password: postgres
    mydb:
      type: mysql
      host: localhost
      port: 3306
      database: db
      username: user
      password: password
    snowflake:
      type: snowflake
      account: snowflake_account
      warehouse: dev_x_small
      role: dev_role
      database: dev_stage
      schema: user_schema
      user: dev_user
      private_key: 'rsa_key.p8'
      private_key_passphrase: 444444
    mssql:
      type: mssql
      connection_string: 'DRIVER={ODBC Driver 18 for SQL Server};SERVER=dagger;DATABASE=test;UID=user;PWD=password'

Project Config File

Project config file stores the properties of the tables that need to be compared. It must be created in tulona-project.yml file and this file can be placed anywhere and that directory will be considered project root directory. Which means that the output` folder will be created under that directory where all results will be stored. It’s always a good idea to create an empty directory and store tulona-project.yml under it.

This is how a tulona-project.yml file looks like:

version: '2.0'
name: integration_project
config-version: 1

outdir: output # the folder comparison result is written into

datasources:
  employee_postgres:
    connection_profile: pgdb
    database: postgres
    schema: public
    table: employee
    primary_key: employee_id
    exclude_columns:  # optional
      - name
    compare_column: Employee_ID  # conditional optional
  employee_mysql:
    connection_profile: mydb
    database: db
    schema: db
    table: employee
    primary_key: employee_id
    exclude_columns:  # optional
      - phone_number
    compare_column: Employee_ID  # conditional optional

Features

Executing tulona or tulona -h or tulona –help returns available commands. All commands take one mandatory parameter, –datasources, a comma separated list of names of datasources from project config file (tulona-project.yml).

Tulona has following commands available:

  • ping: To test connectivity to the databases for the datasources. Sample command:

    • To ping one data source pass the name to the –datasources parameter:

      tulona ping --datasources employee_postgres

    • More than one datasources can be passed to the –datasources parameter separated by commas:

      tulona ping --datasources employee_postgres,employee_mysql

  • profile: To extract and compare metadata of two sources/tables. It includes metadata from information_schema related to the tables and some column level metrics (min, max, average, count & distinct_count). Sample commands:

    • Profiling without –compare flag. It will write metadata and metrics about different sources/tables in different sheets/tabs in the excel file (not a comparison view):

      tulona profile --datasources employee_postgres,employee_mysql

    • Profiling with –compare flag. It will produce a comparison view (side by side):

      tulona profile --compare --datasources employee_postgres,employee_mysql

  • compare-data: To compare sample data from two sources/tables. It will create a comparative view of all common columns from both sources/tables side by side (like: id_ds1 <-> id_ds2) and highlight mismatched values in the output excel file. By default it compares 20 common rows from both tables (subject to availabillity) but the number can be overridden with the command line argument –sample-count. Command samples:

    • Command without –sample-count parameter:

      tulona compare-data --datasources employee_postgres,employee_mysql

    • Command with –sample-count parameter:

      tulona compare-data --sample-count 50 --datasources employee_postgres,employee_mysql

  • compare-column: To compare columns from tables from two sources/tables. This is expecially useful when you want see if all the rows from one table/source is present in the other one by comparing the primary/unique key. The result will be an excel file with extra primary/unique keys from both sides. If both have the same set of primary/unique keys, essentially means they have the same rows, excel file will be empty. Command samples:

    • Column[s] to compare is[are] specified in tulona-project.yml file as part of datasource configs, with compare_column property. Sample command:

      tulona compare-column --datasources employee_postgres,employee_mysql

  • compare: To prepare a comparison report for evrything together. To executed this command just swap the command from any of the above commands with compare. It will prepare comparison of everything and write them into different sheets of a single excel file. Sample command:

    tulona compare --datasources employee_postgres,employee_mysql

For debug level log, add -v or –verbose flag along with any command. For example:

tulona ping -v --datasources employee_postgres

To know more about any specific command, execute tulona <command> -h.

Development Environment Setup

  • For live installation execute pip install –editable core.

Build wheel executable

  • Execute python -m build.

Install wheel executable file

  • Execute pip install <wheel-file.whl>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tulona-0.3.2.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

tulona-0.3.2-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file tulona-0.3.2.tar.gz.

File metadata

  • Download URL: tulona-0.3.2.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tulona-0.3.2.tar.gz
Algorithm Hash digest
SHA256 59cf6004856b1cba2ebbae7ac1c30658d46857f3d8f5f85ec0e871445c3fa0ea
MD5 e4bb865eb3bded46a60162cec4dfe0c6
BLAKE2b-256 30cefa4cb21a5adaf690eae6a9bee88a15150556a6059523959f62e8c11b9b2a

See more details on using hashes here.

File details

Details for the file tulona-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: tulona-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tulona-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d77352ce77081e982010473d711d8eb829d0617e0ccbf53683be6c28d7f56a0b
MD5 3a2bdeb690c63ccfc1b95eb95ea20062
BLAKE2b-256 bd9b3422bdabacc8e60886982cf3573e33fa7043672ca3667680a7826713fe02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page