Cape - a system for explaining outliers in aggregation results through counterbalancing.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Information Analysis

Project description

Cape

As an initial contribution we have developed a novel system called Cape (Counterbalancing with Aggregation Patterns for Explanations) for explaining outliers in aggregation queries through counterbalancing. That is, explanations are outliers in the opposite direction of the outlier of interest. Cape (Counterbalancing with Aggregation Patterns for Explanations) is a system that explains outliers (surprisingly low or high) aggregation function results for group-by queries in SQL. The user provides the system with a query and a surprising outcome for this query. Cape then uses patterns discovered over the input data of the query in an offline pattern mining step to explain the outlier by finding an related outlier in the opposite direction that counterbalances the outlier of interest.

Installation

Prerequisites

Cape requires python 3 and uses python's tkinter for its graphical UI. For example, on ubuntu you can install the prerequisites with:

sudo apt-get install python3 python3-pip python3-tk

Install with pip

Make sure you have pip installed (see previous step).

pip3 install capexplain

Install from github

Alternatively, you can directly install from source. For that you need python3 and the setuptools package. You probably would want to use a virtual environment for that. If you have python3 and pip installed you can install/update the setuptools package by running:

</code></pre>
<p>To install Cape, clone the github repository and use <code>setup.py</code> to install.</p>
<pre lang="shell"><code>git clone git@github.com:IITDBGroup/cape.git capexplain
cd capexplain
python3 setup.py install

Usage

Cape provides a single binary capexplain that support multiple subcommands. The general form is:

capexplain COMMAND [OPTIONS]

Options are specific to each subcommand. Use capexplain help to see a list of supported commands and capexplain help COMMAND get more detailed help for a subcommand.

Overview

Cape currently only supports PostgreSQL as a backend database (version 9 or higher). To use Cape to explain an aggregation outlier, you first have to let cape find patterns for the table over which you are aggregating. This an offline step that only has to be executed only once for each table (unless you want to re-run pattern mining with different parameter settings). Afterwards, you can either use the commandline or Cape's UI to request explanations for an outlier in an aggregation query result.

Mining Patterns

Use capexplain mine [OPTIONS] to mine patterns. Cape will store the discovered patterns in the database. The "mined" patterns will be stored in a created schema called pattern, and the pattern tables generated after running mine command are pattern.{target_table}_global and pattern.{target_table}_local. At the minimum you have to tell Cape how to connect to the database you want to use and which table it should generate patterns for. Run capexplain help mine to get a list of all supported options for the mine command. The options needed to specify the target table and database connection are:

-h ,--host <arg>               - database connection host IP address (DEFAULT: 127.0.0.1)
-u ,--user <arg>               - database connection user (DEFAULT: postgres)
-p ,--password <arg>           - database connection password
-d ,--db <arg>                 - database name (DEFAULT: postgres)
-P ,--port <arg>               - database connection port (DEFAULT: 5432)
-t ,--target-table <arg>       - mine patterns for this table

For instance, if you run a postgres server locally (default) with user postgres (default), password test, and want to mine patterns for a table employees in database mydb, then run:

capexplain mine -p test -d mydb -t employees

Mining algorithm parameters

Cape's mining algorithm takes the following arguments:

--gof-const <arg>              - goodness-of-fit threshold for constant regression (DEFAULT: 0.1)
--gof-linear <arg>             - goodness-of-fit threshold for linear regression (DEFAULT: 0.1)
--confidence <arg>             - global confidence threshold
-r ,--regpackage <arg>         - regression analysis package to use {'statsmodels', 'sklearn'} (DEFAULT: statsmodels)
--local-support <arg>          - local support threshold (DEFAULT: 10)
--global-support <arg>         - global support thresh (DEFAULT: 100)
-f ,--fd-optimizations <arg>   - activate functional dependency detection and optimizations (DEFAULT: False)
-a ,--algorithm <arg>          - algorithm to use for pattern mining {'naive', 'optimized', 'naive_alternative'} (DEFAULT: optimized)
--show-progress <arg>          - show progress meters (DEFAULT: True)
--manual-config                - manually configure numeric-like string fields (treat fields as string or numeric?) (DEFAULT: False)

Running our "crime" data example

We included a subset of the "Chicago Crime" dataset (https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/) in our repository for user to play with. To import this dataset in your postgres databse, under /testdb directory, run the following command template:

psql -h <host> -U <user name> -d <local database name where you want to store our example table> < ~/cape/testdb/crime_demonstration.sql

then run the capexplain commands accordingly to explore this example.

Explaining Outliers

To explain an aggregation outlier use capexplain explain [OPTIONS].

-l ,--log <arg>                - select log level {DEBUG,INFO,WARNING,ERROR} (DEFAULT: ERROR)
--help                         - show this help message
-h ,--host <arg>               - database connection host IP address (DEFAULT: 127.0.0.1)
-u ,--user <arg>               - database connection user (DEFAULT: postgres)
-p ,--password <arg>           - database connection password
-d ,--db <arg>                 - database name (DEFAULT: postgres)
-P ,--port <arg>               - database connection port (DEFAULT: 5432)
--ptable <arg>                 - table storing aggregate regression patterns
--qtable <arg>                 - table storing aggregation query result
--ufile <arg>                  - file storing user question
-o ,--ofile <arg>              - file to write output to
-a ,--aggcolumn <arg>          - column that was input to the aggregation function

for explain option, besides the common options, user should give --ptable,the pattern.{target_table} and --qtable, the target_table. Also, we currently only allow user pass question through a .txt file, user need to put the question in the following format:

attribute1, attribute 2, attribute3...., direction
value1,value2,value3...., high/low

please refer to input.txt to look at an example.

Starting the Explanation Explorer GUI

Cape comes with a graphical UI for running queries, selecting outliers of interest, and exploring patterns that are relevant for an outlier and browsing explanations generated by the system. You need to specify the Postgres server to connect to. The explorer can only generate explanations for queries over tables for which patterns have mined beforehand using capexplain mine. Here is our demo video : (https://www.youtube.com/watch?v=gWqhIUrcwz8)

$ capexplain help gui
capexplain gui [OPTIONS]:
	Open the Cape graphical explanation explorer.

SUPPORTED OPTIONS:
-l ,--log <arg>                - select log level {DEBUG,INFO,WARNING,ERROR} (DEFAULT: ERROR)
--help                         - show this help message
-h ,--host <arg>               - database connection host IP address (DEFAULT: 127.0.0.1)
-u ,--user <arg>               - database connection user (DEFAULT: postgres)
-p ,--password <arg>           - database connection password
-d ,--db <arg>                 - database name (DEFAULT: postgres)
-P ,--port <arg>               - database connection port (DEFAULT: 5432)

For instance, if you run a postgres server locally (default) with user postgres (default), password test, and database mydb, then run:

capexplain gui -p test -d mydb

Links

Cape is developed by researchers at Illinois Institute of Technology and Duke University. For more information and publications see the Cape project page http://www.cs.iit.edu/~dbgroup/projects/cape.html.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

This version

0.4

Aug 24, 2019

0.3

Jun 13, 2019

0.2

Jun 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capexplain-0.4.tar.gz (77.5 kB view details)

Uploaded Aug 24, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

capexplain-0.4-py3-none-any.whl (99.6 kB view details)

Uploaded Aug 24, 2019 Python 3

File details

Details for the file capexplain-0.4.tar.gz.

File metadata

Download URL: capexplain-0.4.tar.gz
Upload date: Aug 24, 2019
Size: 77.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.6

File hashes

Hashes for capexplain-0.4.tar.gz
Algorithm	Hash digest
SHA256	`075b8e5e94fdc655a43feecc6a92da546042ce0ef0642e8e7264d00cdec9585a`
MD5	`3d8538b313c71302831e7c3a6ab84891`
BLAKE2b-256	`c6614bcbe934d46fac7e5555dd2255634ab35c3e33c6ecbf3638fe5ae2c02378`

See more details on using hashes here.

File details

Details for the file capexplain-0.4-py3-none-any.whl.

File metadata

Download URL: capexplain-0.4-py3-none-any.whl
Upload date: Aug 24, 2019
Size: 99.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.6

File hashes

Hashes for capexplain-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0efe895e57e3fe20434f1a5f710eb70c340bf40dc6b72ccf6eb24ea085da67f8`
MD5	`400604749bb8f396cd0ea3f2e84f3882`
BLAKE2b-256	`62e93840a1ec6029031d9388fa79737506aa0b5aa2fd6997c3d892d9e12b395d`

See more details on using hashes here.

capexplain 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cape

Installation

Prerequisites

Install with pip

Install from github

Usage

Overview

Mining Patterns

Mining algorithm parameters

Running our "crime" data example

Explaining Outliers

Starting the Explanation Explorer GUI

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes