Python interface to the R package arules
Project description
Python interface to the R package arules
arulespy
is a Python module available from PyPI.
The arules
module in arulespy
provides an easy to install Python interface to the
R package arules for association rule mining built
with rpy2
.
The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.
The arulesViz
module provides plot()
for visualizing association rules using
the R package arulesViz.
arulespy
provides Python classes
for
Transactions
: Convert pandas dataframes into transaction dataRules
: Association rulesItemsets
: ItemsetsItemMatrix
: sparse matrix representation of sets of items.
with Phyton-style slicing and len()
.
Most arules functions are
interfaced as methods for the four classes with conversion from the R data structures to Python.
Documentation is avaialible in Python via help()
. Detailed online documentation
for the R package is available here.
Low-level arules
functions can also be directly used in the form
R.<arules R function>()
. The result will be a rpy2
data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function a2p()
.
Installation
arulespy
is based on the python package rpy2
which requires an R installation. Here are the installation steps:
-
Install the latest version of R from https://www.r-project.org/
-
Install required libraries/set path depending on your OS:
- libcurl is needed by R package curl.
- Ubuntu:
sudo apt-get install libcurl4-openssl-dev
- MacOS:
brew install curl
- Windows: no installation necessary
- Ubuntu:
- Environment variable
R_HOME
may need to be set for Windows
- libcurl is needed by R package curl.
-
Install
arulespy
which will automatically installrpy2
andpandas
.pip install arulespy
-
Optional: Set the environment variable
R_LIBS_USER
to decide where R packages are stored (see libPaths() for details). If not set then R will determine a suitable location. -
Optional:
arulespy
will install the needed R packages when it is imported for the first time. This may take a while. R packages can also be preinstalled. Start R and runinstall.packages(c("arules", "arulesViz"))
The most likely issue is that rpy2
does not find R.
This will lead the python kernel to die or exit without explanation when the package arulespy
is imported.
Check python -m rpy2.situation
to see if R and R's libraries are found.
Details can be found here.
Example
from arulespy.arules import Transactions, apriori, parameters
import pandas as pd
# define the data as a pandas dataframe
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True]
],
columns=list ('ABC'))
# convert dataframe to transactions
trans = transactions.from_df(df)
# mine association rules
rules = apriori(trans,
parameter = parameters({"supp": 0.1, "conf": 0.8}),
control = parameters({"verbose": False}))
# display the rules as a pandas dataframe
rules.as_df()
LHS | RHS | support | confidence | coverage | lift | count | |
---|---|---|---|---|---|---|---|
1 | {} | {A} | 0.8 | 0.8 | 1 | 1 | 8 |
2 | {} | {C} | 0.8 | 0.8 | 1 | 1 | 8 |
3 | {B} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
4 | {B} | {C} | 0.5 | 1 | 0.5 | 1.25 | 5 |
5 | {A,B} | {C} | 0.4 | 1 | 0.4 | 1.25 | 4 |
6 | {B,C} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
Complete examples:
References
- Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023.
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005.
- Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.
- Michael Hahsler. An R Companion for Introduction to Data Mining: Chapter 5, 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arulespy-0.1.2.tar.gz
.
File metadata
- Download URL: arulespy-0.1.2.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f01bced6fba4cc5686ee3da4fbd57da6d06f63b88557fdecc1970aa74e0d7323 |
|
MD5 | 809093370faffefe733f572c650d24e3 |
|
BLAKE2b-256 | c8b02c315918646621a3f6a940cb87beadf721a8651dc08625f22dc877029b97 |
File details
Details for the file arulespy-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: arulespy-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8e18d067758413977d0265f9716a10f48e2259d1d533164e231b553f4bfd03d |
|
MD5 | 6bcaa85473e8276a1aa65a973ebfb6b3 |
|
BLAKE2b-256 | 72abad9e478cc2badd34c022db5456a298508f19ab1b8d11f179462ee2988abe |