Skip to main content

Some tools for working with data

Project description


Documentation status PyPi link Build status Apache 2.0 License


datools is a collection of Python-based tools for working with data in relational databases. While it contains several utilities for smoothing the rough edges of SQL, its most baked component is datools.diff, an algorithm that's best explained in a blog post and Jupyter Notebook.

To learn more, read the docs or reach out.

Database support

While datools generates SQL for its operations, different databases have their nuances. datools may run on your database today, but in an attempt to give you some certainty as to databases we know it has successfully run on, we run all tests in the test suite against the following databases:

Database Evaluated by test suite
SQLite Since v0.1.2
DuckDB Since v0.1.4
PostgreSQL Since v0.1.5
Redshift, Snowflake You provide an instance, I'll make the tests pass


0.1.5 (2022-04-13)

  • Support for PostgreSQL! The test suite now runs against PostgreSQL, and datools.explanations.diff now allows you to ask "why" about data stored in Postgres. Get excited!
  • datools.sqlalchemy_utils.grouping_sets_query will now generate a GROUPING SETs query for databases that support grouping sets (e.g., Postgres, DuckDB) or the equivalent UNION ALL version for databases without grouping sets support (e.g., SQLite). For more, check out the example in the docs.

0.1.4 (2022-02-27)

  • Python 3.10 support.
  • Updated test suite to run tests against multiple databases, in particular expanding from SQLite only to DuckDB and SQLite.
  • As a result of the last bullet, ensured code runs against DuckDB in addition to SQLite.
  • First stab at documentation (

0.1.3 (2021-12-31)

  • Introduced mypy to linting and CI to ensure code that makes it to main has proper types.
  • Created first working example of DIFF working on a real-world dataset as a Jupyter notebook. This example partially replicates the Scorpion paper when only moteid/sensorids are considered.
  • Separated the on_columns argument of diff into on_column_values (columns for which you want to generate equality predicates as explanations) and and on_column_ranges (columns for which you want to generate range predicates as explanations after bucketing the ranges into 15 equi-sized buckets).

0.1.2 (2021-11-07)

  • First release of DIFF algorithm implementation.

0.1.0 (2021-05-09)

  • First release on PyPI.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datools-0.1.5.tar.gz (23.7 kB view hashes)

Uploaded Source

Built Distribution

datools-0.1.5-py2.py3-none-any.whl (13.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page