Python interface to the R package arules
Project description
Python interface to the R package arules
arulespy
is a Python module available from PyPI.
The arules
module in arulespy
provides an easy to install Python interface to the
R package arules for association rule mining built
with rpy2
.
The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.
The arulesViz
module provides plot()
for visualizing association rules using
the R package arulesViz.
arulespy
provides Python classes
for
Transactions
: Convert pandas dataframes into transaction dataRules
: Association rulesItemsets
: ItemsetsItemMatrix
: sparse matrix representation of sets of items.
with Phyton-style slicing and len()
.
Most arules functions are
interfaced as methods for the four classes with conversion from the R data structures to Python.
Documentation is avaialible in Python via help()
. Detailed online documentation
for the R package is available here.
Low-level arules
functions can also be directly used in the form
R.<arules R function>()
. The result will be a rpy2
data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function a2p()
.
To cite the Python module ‘arulespy’ in publications use:
Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: 10.48550/arXiv.2305.15263
Installation
arulespy
is based on the python package rpy2
which requires an R installation. Here are the installation steps:
-
Install the latest version of R (>4.0) from https://www.r-project.org/
-
Install required libraries on your OS:
- libcurl is needed by R package curl.
- Ubuntu:
sudo apt-get install libcurl4-openssl-dev
- MacOS:
brew install curl
- Windows: no installation necessary, but read the Windows section below.
- Ubuntu:
- libcurl is needed by R package curl.
-
Install
arulespy
which will automatically installrpy2
andpandas
.pip install arulespy
-
Optional: Set the environment variable
R_LIBS_USER
to decide where R packages are stored (see libPaths() for details). If not set then R will determine a suitable location. -
Optional:
arulespy
will install the needed R packages when it is imported for the first time. This may take a while. R packages can also be preinstalled. Start R and runinstall.packages(c("arules", "arulesViz"))
The most likely issue is that rpy2
does not find R or R's shared library.
This will lead the python kernel to die or exit without explanation when the package arulespy
is imported.
Check python -m rpy2.situation
to see if R and R's libraries are found.
If you use iPython notebooks then you can include the following code block in your notebook to check:
from rpy2 import situation
for row in situation.iter_info():
print(row)
The output should include a line saying Loading R library from rpy2: OK
.
Note for Windows users
rpy2
currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:
- Windows 10
- rpy2 version 3.5.14
- Python version 3.10.12
- R version 4.3.1
I use the following code to set the needed environment variables needed by Windows
before I import from arulespy
from rpy2 import situation
import os
r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] = r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)
for row in situation.iter_info():
print(row)
The output should include a line saying Loading R library from rpy2: OK
More information on installing rpy2
can be found here.
Example
from arulespy.arules import Transactions, apriori, parameters
import pandas as pd
# define the data as a pandas dataframe
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True]
],
columns=list ('ABC'))
# convert dataframe to transactions
trans = transactions.from_df(df)
# mine association rules
rules = apriori(trans,
parameter = parameters({"supp": 0.1, "conf": 0.8}),
control = parameters({"verbose": False}))
# display the rules as a pandas dataframe
rules.as_df()
LHS | RHS | support | confidence | coverage | lift | count | |
---|---|---|---|---|---|---|---|
1 | {} | {A} | 0.8 | 0.8 | 1 | 1 | 8 |
2 | {} | {C} | 0.8 | 0.8 | 1 | 1 | 8 |
3 | {B} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
4 | {B} | {C} | 0.5 | 1 | 0.5 | 1.25 | 5 |
5 | {A,B} | {C} | 0.4 | 1 | 0.4 | 1.25 | 4 |
6 | {B,C} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
Complete examples:
References
- Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: 10.48550/arXiv.2305.15263
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005. DOI: 10.18637/jss.v014.i15
- Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.
- Michael Hahsler. An R Companion for Introduction to Data Mining: Chapter 5, 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arulespy-0.1.4.tar.gz
.
File metadata
- Download URL: arulespy-0.1.4.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcbc7c8a3571d03fb9482bd5aa8517bb9975e48a793058f3b381d46d2b0778ab |
|
MD5 | 3613cca8e360f84997cce98eacbd3243 |
|
BLAKE2b-256 | 9066c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1 |
File details
Details for the file arulespy-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: arulespy-0.1.4-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 758a79d177deb7ad2985c9f78e629be9369cf1294ff9251a38ba604083fb8aab |
|
MD5 | 0f8607e97b50be6bdf2f2435b4bcc302 |
|
BLAKE2b-256 | 9f1c68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3 |