Algorithms for association Rule mining
Project description
Temporal Generalized Association Rules
This library provides four algorithms related to Association Rule mining. You can download this repository as a package with:
pip install TemporalGeneralizedRules
The algorithms are:
- Apriori
- Cumulate
- HTAR
- HTGAR
These algorithms use a transactional dataset that is transformed to a vertical format for optimization. Dataset MUST follow the following format:
| order_id | product_name |
|---|---|
| 1 | Bread |
| 1 | Milk |
| 2 | Bread |
| 2 | Beer |
| 3 | Eggs |
Or if timestamps are provided:
| order_id | timestamp | product_name |
|---|---|---|
| 1 | 852087600 | Bread |
| 1 | 852087600 | Milk |
| 2 | 854420400 | Bread |
| 2 | 854420400 | Beer |
| 3 | 854420400 | Eggs |
For taxonomy file use the following format (don't provide headers):
| child | parent |
|---|---|
| Bread | Dairy |
| Milk | Dairy |
| Beer | Beverage |
One line for each child, parent
Each field is separated by ","
TGAR
This is the main class that must be instantiated once.
Usage
import TemporalGeneralizedRules
tgar = TemporalGeneralizedRules.TGAR()
Apriori
This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm.
Usage
tgar.apriori("dataset.csv", 0.05, 0.5)
Cumulate
This algorithm has six parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- taxonomy_filepath: Filepath of the taxonomy in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.cumulate("dataset.csv", 0.05, 0.5, 1.1)
HTAR
This algorithm has four parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.htar("dataset.csv", 0.05, 0.5)
HTGAR
This algorithm has six parameters:
- filepath: Filepath of the dataset in csv format with the format discussed in the previous section.
- taxonomy_filepath: Filepath of the taxonomy in csv format with the format discussed in the previous section.
- min_supp: Minimum support threshold.
- min_conf: Minimum confidence threshold.
- min_r: Minimum R-interesting threshold.
- parallel_count: Optional parameter that enables parallelization in candidate count phase of the algorithm. It can make execution faster.
Usage
tgar.htgar("dataset.csv", 0.05, 0.5, 1.1)
Pypy
For a better performance we recommend using this package with Pypy, a faster implementation of python.
Bibliography
The algorithms provided in this library were based on the following papers:
-
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 487–499. https://dl.acm.org/doi/10.5555/645920.672836
-
Ramakrishnan Srikant, Rakesh Agrawal, Mining generalized association rules, Future Generation Computer Systems, Volume 13, Issues 2–3, 1997, Pages 161-180, ISSN 0167-739X. https://www.sciencedirect.com/science/article/pii/S0167739X97000198
-
R. Agrawal and J. C. Shafer, "Parallel mining of association rules," in IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 962-969, Dec. 1996, doi: 10.1109/69.553164. https://ieeexplore.ieee.org/document/553164
-
Hong et al., 2016.Hong, T.-P., Lan, G.-C., Su, J.-H., Wu, P.-S., and Wang, S.-L. (2016). Discovery of temporal association rules with hierarchical granular framework. Applied Computing and Informatics, 12(2):134–141 https://www.sciencedirect.com/science/article/pii/S2210832716000041
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file TemporalGeneralizedRules-1.0.2.tar.gz.
File metadata
- Download URL: TemporalGeneralizedRules-1.0.2.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aadab622412c079d7ebe46781220081291d787c0bebc67fd657a14e7bdcef71f
|
|
| MD5 |
f6597d882144bcedded4a922d36e3427
|
|
| BLAKE2b-256 |
9e46732247f1450523ed10114c3a1a01197e974c1d5033fc440c0ed6707af690
|
File details
Details for the file TemporalGeneralizedRules-1.0.2-py3-none-any.whl.
File metadata
- Download URL: TemporalGeneralizedRules-1.0.2-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12ee7055c558bc57b2accd8bc33152fbb5a575dc995426e7914529a778a52bb7
|
|
| MD5 |
cabdb70b172a3685d6387ba39534f57b
|
|
| BLAKE2b-256 |
f18ca94542771a3fa49b9255d3e6c5aefc714598ceae38c1be07d570bd986aeb
|