Skip to main content

Python interface to the R package arules

Project description

Python interface to the R package arules

PyPI package version number Actions Status License

arulespy is a Python module available from PyPI. The arules module in arulespy provides an easy to install Python interface to the R package arules for association rule mining built with rpy2.

The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.

The arulesViz module provides plot() for visualizing association rules using the R package arulesViz.

arulespy provides Python classes for

  • Transactions: Convert pandas dataframes into transaction data
  • Rules: Association rules
  • Itemsets: Itemsets
  • ItemMatrix: sparse matrix representation of sets of items.

with Phyton-style slicing and len().

Most arules functions are interfaced as methods for the four classes with conversion from the R data structures to Python. Documentation is avaialible in Python via help(). Detailed online documentation for the R package is available here.

Low-level arules functions can also be directly used in the form R.<arules R function>(). The result will be a rpy2 data type. Transactions, itemsets and rules can manually be converted to Python classes using the helper function a2p().

To cite the Python module ‘arulespy’ in publications use:

Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: 10.48550/arXiv.2305.15263

Installation

arulespy is based on the python package rpy2 which requires an R installation. Here are the installation steps:

  1. Install the latest version of R (>4.0) from https://www.r-project.org/

  2. Install required libraries on your OS:

    • libcurl is needed by R package curl.
      • Ubuntu: sudo apt-get install libcurl4-openssl-dev
      • MacOS: brew install curl
      • Windows: no installation necessary, but read the Windows section below.
  3. Install arulespy which will automatically install rpy2 and pandas.

    pip install arulespy
    
  4. Optional: Set the environment variable R_LIBS_USER to decide where R packages are stored (see libPaths() for details). If not set then R will determine a suitable location.

  5. Optional: arulespy will install the needed R packages when it is imported for the first time. This may take a while. R packages can also be preinstalled. Start R and run install.packages(c("arules", "arulesViz"))

The most likely issue is that rpy2 does not find R or R's shared library. This will lead the python kernel to die or exit without explanation when the package arulespy is imported. Check python -m rpy2.situation to see if R and R's libraries are found. If you use iPython notebooks then you can include the following code block in your notebook to check:

from rpy2 import situation

for row in situation.iter_info():
    print(row)

The output should include a line saying Loading R library from rpy2: OK.

Note for Windows users

rpy2 currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:

  • Windows 10
  • rpy2 version 3.5.14
  • Python version 3.10.12
  • R version 4.3.1

I use the following code to set the needed environment variables needed by Windows before I import from arulespy

from rpy2 import situation
import os

r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] =  r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)

for row in situation.iter_info():
    print(row)

The output should include a line saying Loading R library from rpy2: OK

More information on installing rpy2 can be found here.

Example

from arulespy.arules import Transactions, apriori, parameters
import pandas as pd

# define the data as a pandas dataframe
df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True]
    ],
    columns=list ('ABC')) 

# convert dataframe to transactions
trans = transactions.from_df(df)

# mine association rules
rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  

# display the rules as a pandas dataframe
rules.as_df()
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1 1 8
2 {} {C} 0.8 0.8 1 1 8
3 {B} {A} 0.4 0.8 0.5 1 4
4 {B} {C} 0.5 1 0.5 1.25 5
5 {A,B} {C} 0.4 1 0.4 1.25 4
6 {B,C} {A} 0.4 0.8 0.5 1 4

Complete examples:

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arulespy-0.1.4.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

arulespy-0.1.4-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file arulespy-0.1.4.tar.gz.

File metadata

  • Download URL: arulespy-0.1.4.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for arulespy-0.1.4.tar.gz
Algorithm Hash digest
SHA256 fcbc7c8a3571d03fb9482bd5aa8517bb9975e48a793058f3b381d46d2b0778ab
MD5 3613cca8e360f84997cce98eacbd3243
BLAKE2b-256 9066c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1

See more details on using hashes here.

File details

Details for the file arulespy-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: arulespy-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for arulespy-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 758a79d177deb7ad2985c9f78e629be9369cf1294ff9251a38ba604083fb8aab
MD5 0f8607e97b50be6bdf2f2435b4bcc302
BLAKE2b-256 9f1c68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page