Skip to main content

Utility for comparing results between data sources

Project description

Comparator

COMPARATOR

pypi versions CircleCI Coverage Status

Comparator is a utility for comparing the results of queries run against two databases. Future development will include support for APIs, static files, and more.

Installation

pip install comparator

Usage

Overview

from spackl import db

import comparator as cpt

conf = db.Config()
l = db.Postgres(**conf.default)
r = db.Postgres(**conf.other_db)
query = 'SELECT * FROM my_table ORDER BY 1'

c = cpt.Comparator(l, query, r)
c.run_comparisons()
[('basic_comp', True)]

Included Comparisons

There are some basic comparisons included, and they can be imported and passed using constants.

from comparator.comps import BASIC_COMP, LEN_COMP

c = cpt.Comparator(l, query, r, comps=[BASIC_COMP, LEN_COMP])
c.run_comparisons()
[('basic_comp', True), ('len_comp', True)]

Queries and Exceptions

It’s possible to run different queries against each database. You can raise exceptions if that’s your speed.

lq = 'SELECT * FROM my_table ORDER BY 1'
rq = 'SELECT id, uuid, name FROM reporting.my_table ORDER BY 1'
comparisons = [BASIC_COMP, LEN_COMP]

c = cpt.Comparator(l, lq, r, rq, comps=comparisons)

for result in c.compare():
    if not result:
        raise Exception('{} check failed!'.format(result.name))

Custom Comparisons

You’ll probably want to define your own comparison checks. You can do so by defining functions that accept left and right args, which correspond to the results of the queries against your “left” and “right” data source, respectively. Perform whatever magic you like, and return a boolean (or not… your choice).

def left_is_longer(left, right):
    # Return True if left contains more rows than right
    return len(left) > len(right)


def totals_are_equal(left, right):
    # Return True if sum(left) == sum(right)
    sl, sr = 0, 0
    for row in left:
        sl += int(row[1])
    for row in right:
        sr += int(row[1])
    return sl == sr


c = cpt.Comparator(l, query, r, comps=[left_is_longer, totals_are_equal])
c.run_comparisons()
[('left_is_longer', False), ('totals_are_equal', True)]

Access Comparator and Query Results

The results of both queries and comparisons can be checked using standard operators, as well as for “truthiness” (ex: failures = [result.name for result in c.compare() if result is False]).

Comparisons do not always need to return a boolean. Accessing the resulting value of such a comparison is simple.

def len_diff(left, right):
    return len(left) - len(right)


c = cpt.Comparator(l, query, r, comps=len_diff)
res = c.run_comparisons()[0]
if res == 0:
    print('They match')
elif res < 0:
    print('Left is shorter by {}'.format(res.result))
else:
    print('Left is longer by {}'.format(res.result))

It’s recommended that you use the spackl package for instantiating your “left” and “right” data source objects (pip install spackl). This package was originally part of comparator, and provides the following functionality:

Query results are contained in the QueryResult class, which provides simple yet powerful ways to look up and access the output of the query. Data can be retrieved as a dict, list, json string, or pandas DataFrame. Rows/columns can be accesed by index, attribute, or key. Iterating on the QueryResult returns a QueryResultRow, which has the same lookup functionality, as well as standard operators (<, >, =, etc).

from spackl import db

conf = db.Config()
pg = db.Postgres(**conf.default)
res = pg.query(query_string)

res          # [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]

res.a        # (1, 4, 7)
res['a']     # (1, 4, 7)
res[0]       # QueryResultRow : (1, 2, 3)

res[0].a     # 1
res[0]['a']  # 1
res[0][0]    # 1

res.dict()   # {'a': (1, 4, 7), 'b': (2, 5, 8), 'c': (3, 6, 9)}
res.list()   # [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
res.first()  # QueryResultRow : (1, 2, 3)

These result sets can be used to great effect in comparison callables. For example, accessing the result of a query as a pandas DataFrame allows for an endless variety of checks/manipulations do be done on a single query output.

Support is being added to spackl to allow for querying from files and APIs using the same methods, allowing for easy comparison between many disparate data sources. Stay tuned.

CHANGELOG

0.4.0 (2019-03-09)

  • BREAKING - All source modules and methods have been stripped out

  • Functionality has been moved to the spackl package (pip install spackl)

  • The comparator package will expect spackl to be used for all left and right data sources

0.4.0rc3 (2018-12-05)

  • Adds better transaction handling in the PostgresDb class

  • Cleans up calls to connect() in the Db classes

0.4.0rc2 (2018-12-05)

  • BREAKING - QueryPair arguments order has changed (QueryPair(left, lquery, right, rquery))

  • QueryPair, Comparator, and ComparatorSet no longer require a “right” Db

0.4.0rc1 (2018-11-07)

  • DEPRECATED - the from_list method on ComparatorSet

  • adds the QueryPair class

  • BREAKING - Comparator and ComparatorSet are instantiated with QueryPair objects

  • BREAKING - ComparatorSet.from_dict() requires the dict as the first argument

  • BREAKING - QueryResult.keys() and QueryResult.values() both return generators

  • the rquery passed to a QueryPair can be formatted with the lquery query result

  • adds the QueryResultCol class

  • adds the append, pop, extend, and filter methods on QueryResult

  • downgrades pandas version requirement to >=0.22.0

  • improves docstrings on QueryResult methods

  • adds slice handling to QueryResult

  • adds empty property to QueryResult

0.3.2 (2018-10-04)

  • adds MANIFEST.in for readme and changes

0.3.1 (2018-10-03)

  • adds creds_file to possible BigQueryDb init kwargs

0.3.0 (2018-10-03)

  • DEPRECATED - the query_df method on BaseDb and subclasses

  • DEPRECATED - the output kwarg for Comparator results

  • adds the execute method on BaseDb and subclasses

  • adds the QueryResult and QueryResultRow classes

  • adds the ComparatorSet class

  • adds list_tables and delete_table methods to BigQueryDb

  • cleans up some python 2/3 compatability using six

0.2.1 (2018-09-19)

  • officially support Python 2.7, 3.6, and 3.7

0.2.0 (2018-09-18)

  • adds query_df methods for returning pandas DataFrames

  • adds output kwarg to Comparator to allow calling the query_df method

0.1.0 (2018-09-12)

  • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comparator-0.4.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distributions

comparator-0.4.0-py3.7.egg (26.4 kB view details)

Uploaded Source

comparator-0.4.0-py3.6.egg (26.3 kB view details)

Uploaded Source

comparator-0.4.0-py2.7.egg (26.2 kB view details)

Uploaded Source

File details

Details for the file comparator-0.4.0.tar.gz.

File metadata

  • Download URL: comparator-0.4.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for comparator-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2b1ba8532a07ac82c453f7471aa2b13ced91e2f3f55eca3d9180f1cde4e09521
MD5 07192d6f8a2d056beb3ad42416c98994
BLAKE2b-256 48ff65fe002c2422b60324034984ff3ddda1e2033b16c30c5aedca60c9d435b6

See more details on using hashes here.

File details

Details for the file comparator-0.4.0-py3.7.egg.

File metadata

  • Download URL: comparator-0.4.0-py3.7.egg
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for comparator-0.4.0-py3.7.egg
Algorithm Hash digest
SHA256 6c7f38461b4313a19d82b148a464267a92c1e743676e94676d7b6d41f7d65070
MD5 baf73303f8ce27ee20de8b9363d72f79
BLAKE2b-256 af0c30dbafed195e4f1ca1291d0595bbbade1cac8f959ed4693602ce91c5ccff

See more details on using hashes here.

File details

Details for the file comparator-0.4.0-py3.6.egg.

File metadata

  • Download URL: comparator-0.4.0-py3.6.egg
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for comparator-0.4.0-py3.6.egg
Algorithm Hash digest
SHA256 9c2db02a6384dc0ef69e4271f921d30829e6b0bd1d898325f6bad54401469d9f
MD5 9b14600734013f6358015d99645676fb
BLAKE2b-256 432f718a97a0aab4738d3b6e44112c4997c8a38da9b72df09b4b258415ddd1f2

See more details on using hashes here.

File details

Details for the file comparator-0.4.0-py2.7.egg.

File metadata

  • Download URL: comparator-0.4.0-py2.7.egg
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for comparator-0.4.0-py2.7.egg
Algorithm Hash digest
SHA256 e22854c27d5cbdfb36000ae7b76c110d612b780251a351e95dfd940f84a5a06c
MD5 d69d4e941361f74022e46343ca96c14e
BLAKE2b-256 1add22aeb41084dc7a6f05684712800c756db41ccd14874c7d5d49aad017247e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page