Skip to main content

Advanced Python Association Rule Visualization Library

Project description

Introduction

Advanced Python Association Rule Visualization Library

Summary

Loosely based on ARulesViz for R and the ideas described in this paper.

Association Rules Mining (ARM) produces Association Rules (AR) from mined Item Sets in a DataBase (DB). Most ARM libraries represent these output rules textually using the Antecedent (predictor), Consequent (predicted) and Descriptive Metadata (Support, Confidence, Lift, etc.) This presents a problem since typical DBs can yield 100-1000s of rules, forcing us to either apply filtration criterion or devise more efficient visualizations.

While filtration is the most common and effective approach, dedicated visualizations are also valuable, especially in data exploration scenarios where the characteristics of interesting data not may not be known in advance

Generating Rules

PyARMViz is designed to run on Association Rules like the ones produced by Efficient-Apriori

The library includes a set of synthetic retail transaction data for testing and demonstration purposes. This data includes both transactions sets (to be run through a compatible ARM workflow of your choice) or a rules set which can be input directly into the library.

from PyARMViz import datasets
rules = datasets.load_shopping_rules()

#Visualizations

The visualizations in this library can be divided into two families based on the data they display about the individual Association Rules

Rule Metadata Visualizations

Rule Metadata visualizations focus entirely on the descriptive metadata of each rule and do not consider the antecedents and consequents.

This makes this visualization less helpful for discovering interesting regions in the original data, but enables distributional evaluation of rules to develop better filtration criterion.

These visualization also tend to be among the most scalable because they do not need to display multiple entity types (rule v. antecedent/precedent) simultaneously.

Confidence/Lift/Support Scatterplot

The three primary descriptive metadata for rules are Confidence (percentage of time the antecedent leads to the consequent versus the support of the antecedent), the Support (percentage of times the antecedent leads to the consequent versus the size of the database) and the Lift (percentage of times the antecedent leads to the consequent versus the support of the consequent)

All three of these are commonly used filtration criterion for ARs, and by graphing their distribution, this visualization can help spot outlier and visually show the distribution of rules that would pass a given set of criterion.

These new criterion can then be applied against the existing rules (as shown below), or upstream in the analysts ARM workflow

from PyARMViz import datasets
from PyARMViz import PyARMViz

rules = datasets.load_shopping_rules()
PyARMViz.metadata_scatter_plot(rules)

Rule Entity Visualizations

These visualizations include the Antecedent and Consequent Entities of the Rule (ex. Antecedent:Eggs,Flour -> Consequent:Milk) with some portion of its descriptive metadata.

These visualizations are useful for identifying the rules which are adjacent through their terminating entities, and potentially complex structures such as hubs or chains formed by those adjacencies. These complex structures, in turn, can indicate regions of interest within the data which can be extracted and inspected more closely.

Parallel Plots

Parallel plots are a popular choice for large scale visualization of sets which highlights common elements in those sets.

All parallel plots work by taking multiple ordered sets of fixed length, assigning an axis to each "place" in the set (ex. first item, second item, etc.). Assigning each value found in that place to a location on the axis, and drawing a line between the locations assigned to each value for each set.

We can input our Association Rules into this visualization by converting each rule into an ordered set by appending the consequents to the antecedents (ex. antecedent1, antacedent2, consequent1) and visualizing them.

A major disadvantage of these diagrams for this purpose is that ARs are not of a fixed length. Currently the library overcomes this by breaking the rules down by length and creating a different plot for each. This was chosen because Plotly does not allow us to input, forcing us to use some awkward padding techniques that compromise readability.

Parallel Cordinate Plot

Parallel coordinate plots are the more common, popular and supported version of parallel plots.

Parallel Category Plot

The less popular, less well documented and (arguably) more appropriate choice for this application is Parallel Category plots, sometimes called ribbon or Alluvial plots. These are essentially the same as Parallel Coordinate plots except for how they allocate space on the axis for each value.

While coordinate plots allocate a specific point per axis for each value in order to accomodate a theoretically infinite number of values, category plots will allocate a segment of the axis based on the number of values.

Optimally this subdivision will be based on the total number of sets that contain that value, giving us a way to demonstrate the frequency of individual and combinatorial values in the database.

The downside of this diagram, at least in the Plotly implementation, is that it provides less opportunity for visual highlighting of the characteristics of the individual role (size, color, brightness).

Network Diagrams

This group of diagrams (my favorite) work by turning the rules into a directional network graph using the NetworkX libary.

This allows us to leverage one of several potential graph visualizations to show the adjacency (shared Antecedents or Consequents) of rules and more complex structures (chains, hubs) formed by that adjacency.

Plotly Network Diagram

This version uses the Plotly network diagram visualization to visualize the directional network graph.

It has the advantage of being self contained in the browser and requiring no additional dependencies aside from the base Plotly.

It has the disadvantage of limited arrangement algorithm and visual highlighting options when compared to dedicated graph visualization software.

from PyARMViz import datasets
from PyARMViz import PyARMViz

rules = datasets.load_shopping_rules()
adjacency_graph_plotly(rules)

Gephi Network Diagram Export

Network diagrams provide one of the most flexible, scalable and powerful visualizations in this category but can result in highly interconnected graphs that are difficult and computationally expensive to visualize. One solution is to use a dedicated, open source graph visualization tool like Gephi, which provides a rich set of arrangement and visual highlighting options unavailable in a purely Javascript solution such as Plotly

THe downside is that we will need to export the directional network graph from NetworkX to some form that Gephi can use, in this case the GEFX file format. After the file is output by this function, simply install Gephi, open it and load the output file on disc.

Note that if an explicit destination location and filename is not provided, the function defaults to "rule.gexf" in the current working directory of the Python script calling the function.

from PyARMViz import datasets
from PyARMViz import PyARMViz

rules = datasets.load_shopping_rules()
adjacency_graph_gephi(rules)

Installation

From Github

  1. In CLI (with Git setup locally) clone to local directory git clone https://github.com/Mazeofthemind/PyARMViz.git
  2. Navigate into the root directory of the cloned project cd PyARMViz
  3. Execute Python build and install (may require sudo or alternate Python/PIP psudonym) pip install .

From PyPi (Currently only Testing)

pip install --index-url https://test.pypi.org/simple/ PyARMViz

Build

This project is currently built under Poetry a newer Python build tool leverage virtual environments

git clone https://github.com/Mazeofthemind/PyARMViz.git
pip install poetry
cd PyARMViz
python -m poetry build
python -m poetry publish
'''

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyARMViz-0.1.4.tar.gz (660.0 kB view details)

Uploaded Source

Built Distribution

PyARMViz-0.1.4-py2.py3-none-any.whl (658.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file PyARMViz-0.1.4.tar.gz.

File metadata

  • Download URL: PyARMViz-0.1.4.tar.gz
  • Upload date:
  • Size: 660.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.2 Linux/5.4.0-29-generic

File hashes

Hashes for PyARMViz-0.1.4.tar.gz
Algorithm Hash digest
SHA256 7cfba1177818c1c14aa6ba67b7fe74e2676ce20937af7c395b2fe00a4473ef8c
MD5 65ee901f069f5daf663da2f5824144c8
BLAKE2b-256 2ee60bcd77ff4b799cca05f9b450e8cc86c92c263afed8eb782609740b0dbfd9

See more details on using hashes here.

File details

Details for the file PyARMViz-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: PyARMViz-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 658.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.8.2 Linux/5.4.0-29-generic

File hashes

Hashes for PyARMViz-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5bba48a1b0c790955ecd080cc11f1469a02e0fe84268dc9f40a97f4b1f04bde0
MD5 ec2cc90849e79874e328e185ad4f435e
BLAKE2b-256 fc61f19449baf5ff0b0ecbfe99f8f1ad7f8c324175d6f03b0d145357756e1032

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page