Python interface to the R package arules
Project description
Python interface to the R package arules
arulespy
is a Python module available from PyPI.
The arules
module in arulespy
provides an easy to install Python interface to the
R package arules for association rule mining built
with rpy2
.
The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.
The arulesViz
module provides plot()
for visualizing association rules using
the R package arulesViz.
arulespy
provides Python classes
for
Transactions
: Convert pandas dataframes into transaction dataRules
: Association rulesItemsets
: Itemsets
with Phyton-style slicing and len()
.
Most arules functions are
interfaced with conversion from the R data structures to Python.
Documentation is avaialible in Python via help()
. Detailed online documentation
for the R package is available here.
Low-level arules
functions can also be directly used in the form
arules.r.<arules R function>()
. The result will be a rpy2
data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function a2p()
.
Installation
arulespy
is based on the python package rpy2
which requires an R installation. Here are the installation steps:
-
Install the latest version of R from https://www.r-project.org/
-
Install required libraries/set path depending on your OS:
- libcurl is needed by R package curl.
- Ubuntu:
sudo apt-get install libcurl4-openssl-dev
- MacOS:
brew install curl
- Windows: no installation necessary
- Ubuntu:
- Environment variable
R_HOME
needs to be set for Windows
- libcurl is needed by R package curl.
-
Install
arulespy
which will automatically installrpy2
andpandas
.pip install arulespy
-
Optional: Set the environment variable
R_LIBS
to decide where R packages are stored. If not set then R will determine a suitable location.
The most likely issue is rpy2
. Check python -m rpy2.situation
to see if R and R's libraries are found.
Details can be found here.
Example
from arulespy import arules
import pandas as pd
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True]
],
columns=list ('ABC'))
# convert dataframe to transactions
trans = arules.transactions(df)
# mine association rules
rules = arules.apriori(trans,
parameter = arules.parameters({"supp": 0.1, "conf": 0.8}),
control = arules.parameters({"verbose": False}))
# display the rules
rules.as_df()
LHS RHS support confidence coverage lift count
1 {} {A} 1.0 1.0 1.0 1.000000 5
2 {B} {C} 0.6 1.0 0.6 1.666667 3
3 {C} {B} 0.6 1.0 0.6 1.666667 3
4 {B} {A} 0.6 1.0 0.6 1.000000 3
5 {C} {A} 0.6 1.0 0.6 1.000000 3
6 {B,C} {A} 0.6 1.0 0.6 1.000000 3
7 {A,B} {C} 0.6 1.0 0.6 1.666667 3
8 {A,C} {B} 0.6 1.0 0.6 1.666667 3
Complete examples:
References
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005.
- Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.
- Michael Hahsler. An R Companion for Introduction to Data Mining: Chapter 5, 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.