Open-source Python library for pattern mining
Project description
pypatternminer
pypatternminer is an open-source Python library for pattern mining. It provides a broad, research-oriented, and reproducible framework for major pattern-mining families, including:
- Itemset mining
- High-utility itemset mining
- High-utility sequential pattern mining
- Sequential pattern mining
The project is developed with two main goals:
- to provide a comprehensive Python framework covering many influential pattern-mining algorithms in a unified environment;
- to provide publicly available code and testing datasets to support transparent and reproducible research.
Why pypatternminer?
Pattern mining has produced a large number of important algorithms over the past three decades, but the Python ecosystem remains relatively fragmented. pypatternminer aims to help bridge this gap by offering:
- a unified Python environment for multiple pattern-mining families;
- broad algorithmic coverage;
- validation against SPMF reference implementations;
- public access to source code and testing datasets;
- a foundation for teaching, benchmarking, and future research.
Current Coverage
At the current stage, pypatternminer includes 144 implemented entries:
| Category | Implemented |
|---|---|
| Itemset mining | 42 |
| High-utility itemset mining | 70 |
| High-utility sequential pattern mining | 3 |
| Sequential pattern mining | 29 |
| Total | 144 |
Planned future extensions include:
- association rule mining
- sequential rule mining
- sequence prediction
- periodic pattern mining
- episode mining
Validation
A key design principle of pypatternminer is implementation reliability.
Each Python implementation is validated against the corresponding Java implementation in SPMF using:
- the same input datasets,
- the same parameter settings,
- and the same experimental conditions.
The returned pattern sets and associated values, such as support or utility, are compared to ensure matching outputs.
Installation
Install from PyPI
pip install pypatternminer
Install from source
Clone the repository and install the package locally:
git clone https://github.com/taiduydinh/pypatternminer.git
cd pypatternminer
pip install .
If you want to install in editable mode for development:
pip install -e .
Quick Start
Algorithms are currently organized by module. A typical usage pattern is to import the algorithm class from its corresponding module.
Example: Apriori
from pypatternminer.apriori import AlgoApriori
algo = AlgoApriori()
algo.runAlgorithm(
minsup=0.5,
input_path="contextPasquier99.txt",
output_path="output_py.txt"
)
Example: LCIM
from pypatternminer.lcim import AlgoLCIM
algo = AlgoLCIM()
algo.runAlgorithm(
input_file="DB_cost.txt",
output_file="output_py.txt",
minUtility=28.0,
maxcost=10.0,
minsupp=0.3
)
Package Organization
The project is organized as follows:
pypatternminer/
├── .github/ # GitHub Actions workflows
├── datasets/ # testing datasets
├── pypatternminer/ # source code
│ ├── __init__.py
│ ├── apriori.py
│ ├── aprioriclose.py
│ ├── aprioriinverse.py
│ ├── ...
│ └── lcim.py
├── README.md
└── pyproject.toml
Repository Structure
The repository is designed to support both algorithm development and reproducible experimentation:
pypatternminer/contains the Python implementations of the algorithms;datasets/contains testing datasets used for validation and experiments;.github/workflows/contains the release and publishing workflows.
Usage Notes
- Algorithms are currently accessed through their corresponding modules, for example:
from pypatternminer.apriori import AlgoApriorifrom pypatternminer.lcim import AlgoLCIM
- Input files, parameters, and output formats may differ across algorithms depending on their original design and reference implementation.
Contributing
Contributions are welcome.
You can contribute by:
- implementing additional pattern-mining algorithms;
- improving documentation and examples;
- reporting bugs or testing issues;
- adding benchmark datasets and validation cases;
- improving code quality and reproducibility.
If you would like to contribute, please open an issue or submit a pull request.
Citation
If you use pypatternminer in your research, please cite the repository or the related paper if available.
A formal citation entry can be added here in future releases.
Vision
pypatternminer is intended to serve as both a practical software library and a research infrastructure for the pattern-mining community. By combining algorithmic breadth, reproducibility, and public accessibility, the project aims to support researchers, educators, and practitioners working on pattern mining and related areas.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pypatternminer-0.0.3.tar.gz.
File metadata
- Download URL: pypatternminer-0.0.3.tar.gz
- Upload date:
- Size: 310.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c44df794f52b8a7c1973f0c0947effb80f7aa98dddf047e02642ca380be3721d
|
|
| MD5 |
37f2f3b8461559a0e51bf11f873f619e
|
|
| BLAKE2b-256 |
2ef1d0bb416d49b77cc86e08de64411a86feac992ed4743e369db33bb14e0dc5
|
Provenance
The following attestation bundles were made for pypatternminer-0.0.3.tar.gz:
Publisher:
publish.yml on taiduydinh/pypatternminer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pypatternminer-0.0.3.tar.gz -
Subject digest:
c44df794f52b8a7c1973f0c0947effb80f7aa98dddf047e02642ca380be3721d - Sigstore transparency entry: 1193086050
- Sigstore integration time:
-
Permalink:
taiduydinh/pypatternminer@3420976f5955a8bd7675ed1508ec4d883a3f1faf -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/taiduydinh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3420976f5955a8bd7675ed1508ec4d883a3f1faf -
Trigger Event:
push
-
Statement type:
File details
Details for the file pypatternminer-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pypatternminer-0.0.3-py3-none-any.whl
- Upload date:
- Size: 399.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36c59d4b2591202ff464bd67f395327b2dcd3c5a400418ac74fbbcf9033b09fa
|
|
| MD5 |
d6ef4f90421c0fc12a4e1dfe9d137e19
|
|
| BLAKE2b-256 |
e1fa75c742165edc624bde29efed29a303e516d817f3efa607bb036bb451d928
|
Provenance
The following attestation bundles were made for pypatternminer-0.0.3-py3-none-any.whl:
Publisher:
publish.yml on taiduydinh/pypatternminer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pypatternminer-0.0.3-py3-none-any.whl -
Subject digest:
36c59d4b2591202ff464bd67f395327b2dcd3c5a400418ac74fbbcf9033b09fa - Sigstore transparency entry: 1193086081
- Sigstore integration time:
-
Permalink:
taiduydinh/pypatternminer@3420976f5955a8bd7675ed1508ec4d883a3f1faf -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/taiduydinh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3420976f5955a8bd7675ed1508ec4d883a3f1faf -
Trigger Event:
push
-
Statement type: