Algorithms for association Rule mining
Project description
Temporal Generalized Association Rules
This library provides four algorithms related to Association Rule mining. The algorithms are:
- vertical_apriori
- vertical_cumulate
- htar
- htgar
These algorithms use a transactional dataset that is transformed to a vertical format for optimization. Dataset MUST follow the following format:
order_id | product_name |
---|---|
1 | Bread |
1 | Milk |
2 | Bread |
2 | Beer |
3 | Eggs |
Or if timestamps are provided:
order_id | timestamp | product_name |
---|---|---|
1 | 852087600 | Bread |
1 | 852087600 | Milk |
2 | 854420400 | Bread |
2 | 854420400 | Beer |
3 | 854420400 | Eggs |
Each field is separated by ","
TGAR
This is the main class that must be instantiated once.
Usage
import TemporalGeneralizedRules
tgar = TemporalGeneralizedRules.TGAR()
Vertical Apriori
This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm.
Usage
tgar.apriori("dataset.csv", 0.05, 0.5)
Vertical Cumulate
This algorithm has five parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.vertical_cumulate("dataset.csv", 0.05, 0.5, 1.1)
HTAR
This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.htar("dataset.csv", 0.05, 0.5)
HTGAR
This algorithm has five parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.htgar("dataset.csv", 0.05, 0.5, 1.1)
Pypy
For a better performance we recommend using this package with Pypy, a faster implementation of python.
Bibliography
The following were based on the following papers:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for TemporalGeneralizedRules-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28b6de278ea4e4d7f2b209d61027b550073f549b5df34f24c97300830b99fd7c |
|
MD5 | 63f1547169ea9fc88b37ccf392b63432 |
|
BLAKE2b-256 | 82857341eabda47760ad53ca123ee38e101021a5fb55edd229a1ef4c0502cf76 |
Hashes for TemporalGeneralizedRules-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 217f867e4921a7661aa0f479e14e856d36ec2405ae01a294bd124f0559f77675 |
|
MD5 | e274fa26f6d057f3644475cfd2c0846c |
|
BLAKE2b-256 | 1c337554e7edbca8ce23044e0f8b3d2a9b95b19355b627ca4288e4c4b6144c47 |