Utility for comparing results between data sources
Project description
COMPARATOR
Comparator is a utility for comparing the results of queries run against two databases. Future development will include support for APIs, static files, and more.
Installation
pip install comparator
Usage
Overview
from spackl import db
import comparator as cpt
conf = db.Config()
l = db.Postgres(**conf.default)
r = db.Postgres(**conf.other_db)
query = 'SELECT * FROM my_table ORDER BY 1'
c = cpt.Comparator(l, query, r)
c.run_comparisons()
[('basic_comp', True)]
Included Comparisons
There are some basic comparisons included, and they can be imported and passed using constants.
from comparator.comps import BASIC_COMP, LEN_COMP
c = cpt.Comparator(l, query, r, comps=[BASIC_COMP, LEN_COMP])
c.run_comparisons()
[('basic_comp', True), ('len_comp', True)]
Queries and Exceptions
It’s possible to run different queries against each database. You can raise exceptions if that’s your speed.
lq = 'SELECT * FROM my_table ORDER BY 1'
rq = 'SELECT id, uuid, name FROM reporting.my_table ORDER BY 1'
comparisons = [BASIC_COMP, LEN_COMP]
c = cpt.Comparator(l, lq, r, rq, comps=comparisons)
for result in c.compare():
if not result:
raise Exception('{} check failed!'.format(result.name))
Custom Comparisons
You’ll probably want to define your own comparison checks. You can do so by defining functions that accept left and right args, which correspond to the results of the queries against your “left” and “right” data source, respectively. Perform whatever magic you like, and return a boolean (or not… your choice).
def left_is_longer(left, right):
# Return True if left contains more rows than right
return len(left) > len(right)
def totals_are_equal(left, right):
# Return True if sum(left) == sum(right)
sl, sr = 0, 0
for row in left:
sl += int(row[1])
for row in right:
sr += int(row[1])
return sl == sr
c = cpt.Comparator(l, query, r, comps=[left_is_longer, totals_are_equal])
c.run_comparisons()
[('left_is_longer', False), ('totals_are_equal', True)]
Access Comparator and Query Results
The results of both queries and comparisons can be checked using standard operators, as well as for “truthiness” (ex: failures = [result.name for result in c.compare() if result is False]).
Comparisons do not always need to return a boolean. Accessing the resulting value of such a comparison is simple.
def len_diff(left, right):
return len(left) - len(right)
c = cpt.Comparator(l, query, r, comps=len_diff)
res = c.run_comparisons()[0]
if res == 0:
print('They match')
elif res < 0:
print('Left is shorter by {}'.format(res.result))
else:
print('Left is longer by {}'.format(res.result))
It’s recommended that you use the spackl package for instantiating your “left” and “right” data source objects (pip install spackl). This package was originally part of comparator, and provides the following functionality:
Query results are contained in the QueryResult class, which provides simple yet powerful ways to look up and access the output of the query. Data can be retrieved as a dict, list, json string, or pandas DataFrame. Rows/columns can be accesed by index, attribute, or key. Iterating on the QueryResult returns a QueryResultRow, which has the same lookup functionality, as well as standard operators (<, >, =, etc).
from spackl import db
conf = db.Config()
pg = db.Postgres(**conf.default)
res = pg.query(query_string)
res # [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
res.a # (1, 4, 7)
res['a'] # (1, 4, 7)
res[0] # QueryResultRow : (1, 2, 3)
res[0].a # 1
res[0]['a'] # 1
res[0][0] # 1
res.dict() # {'a': (1, 4, 7), 'b': (2, 5, 8), 'c': (3, 6, 9)}
res.list() # [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
res.first() # QueryResultRow : (1, 2, 3)
These result sets can be used to great effect in comparison callables. For example, accessing the result of a query as a pandas DataFrame allows for an endless variety of checks/manipulations do be done on a single query output.
Support is being added to spackl to allow for querying from files and APIs using the same methods, allowing for easy comparison between many disparate data sources. Stay tuned.
CHANGELOG
0.4.0 (2019-03-09)
BREAKING - All source modules and methods have been stripped out
Functionality has been moved to the spackl package (pip install spackl)
The comparator package will expect spackl to be used for all left and right data sources
0.4.0rc3 (2018-12-05)
Adds better transaction handling in the PostgresDb class
Cleans up calls to connect() in the Db classes
0.4.0rc2 (2018-12-05)
BREAKING - QueryPair arguments order has changed (QueryPair(left, lquery, right, rquery))
QueryPair, Comparator, and ComparatorSet no longer require a “right” Db
0.4.0rc1 (2018-11-07)
DEPRECATED - the from_list method on ComparatorSet
adds the QueryPair class
BREAKING - Comparator and ComparatorSet are instantiated with QueryPair objects
BREAKING - ComparatorSet.from_dict() requires the dict as the first argument
BREAKING - QueryResult.keys() and QueryResult.values() both return generators
the rquery passed to a QueryPair can be formatted with the lquery query result
adds the QueryResultCol class
adds the append, pop, extend, and filter methods on QueryResult
downgrades pandas version requirement to >=0.22.0
improves docstrings on QueryResult methods
adds slice handling to QueryResult
adds empty property to QueryResult
0.3.2 (2018-10-04)
adds MANIFEST.in for readme and changes
0.3.1 (2018-10-03)
adds creds_file to possible BigQueryDb init kwargs
0.3.0 (2018-10-03)
DEPRECATED - the query_df method on BaseDb and subclasses
DEPRECATED - the output kwarg for Comparator results
adds the execute method on BaseDb and subclasses
adds the QueryResult and QueryResultRow classes
adds the ComparatorSet class
adds list_tables and delete_table methods to BigQueryDb
cleans up some python 2/3 compatability using six
0.2.1 (2018-09-19)
officially support Python 2.7, 3.6, and 3.7
0.2.0 (2018-09-18)
adds query_df methods for returning pandas DataFrames
adds output kwarg to Comparator to allow calling the query_df method
0.1.0 (2018-09-12)
initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file comparator-0.4.0.tar.gz
.
File metadata
- Download URL: comparator-0.4.0.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b1ba8532a07ac82c453f7471aa2b13ced91e2f3f55eca3d9180f1cde4e09521 |
|
MD5 | 07192d6f8a2d056beb3ad42416c98994 |
|
BLAKE2b-256 | 48ff65fe002c2422b60324034984ff3ddda1e2033b16c30c5aedca60c9d435b6 |
File details
Details for the file comparator-0.4.0-py3.7.egg
.
File metadata
- Download URL: comparator-0.4.0-py3.7.egg
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c7f38461b4313a19d82b148a464267a92c1e743676e94676d7b6d41f7d65070 |
|
MD5 | baf73303f8ce27ee20de8b9363d72f79 |
|
BLAKE2b-256 | af0c30dbafed195e4f1ca1291d0595bbbade1cac8f959ed4693602ce91c5ccff |
File details
Details for the file comparator-0.4.0-py3.6.egg
.
File metadata
- Download URL: comparator-0.4.0-py3.6.egg
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c2db02a6384dc0ef69e4271f921d30829e6b0bd1d898325f6bad54401469d9f |
|
MD5 | 9b14600734013f6358015d99645676fb |
|
BLAKE2b-256 | 432f718a97a0aab4738d3b6e44112c4997c8a38da9b72df09b4b258415ddd1f2 |
File details
Details for the file comparator-0.4.0-py2.7.egg
.
File metadata
- Download URL: comparator-0.4.0-py2.7.egg
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e22854c27d5cbdfb36000ae7b76c110d612b780251a351e95dfd940f84a5a06c |
|
MD5 | d69d4e941361f74022e46343ca96c14e |
|
BLAKE2b-256 | 1add22aeb41084dc7a6f05684712800c756db41ccd14874c7d5d49aad017247e |