Skip to main content

Open-source Python library for pattern mining

Project description

pypatternminer

pypatternminer is an open-source Python library for pattern mining. It provides a broad, research-oriented, and reproducible framework for major pattern-mining families, including:

  • Itemset mining
  • High-utility itemset mining
  • High-utility sequential pattern mining
  • Sequential pattern mining

The project is developed with two main goals:

  1. to provide a comprehensive Python framework covering many influential pattern-mining algorithms in a unified environment;
  2. to provide publicly available code and testing datasets to support transparent and reproducible research.

Why pypatternminer?

Pattern mining has produced a large number of important algorithms over the past three decades, but the Python ecosystem remains relatively fragmented. pypatternminer aims to help bridge this gap by offering:

  • a unified Python environment for multiple pattern-mining families;
  • broad algorithmic coverage;
  • validation against SPMF reference implementations;
  • public access to source code and testing datasets;
  • a foundation for teaching, benchmarking, and future research.

Current Coverage

At the current stage, pypatternminer includes 144 implemented entries:

Category Implemented
Itemset mining 42
High-utility itemset mining 70
High-utility sequential pattern mining 3
Sequential pattern mining 29
Total 144

Planned future extensions include:

  • association rule mining
  • sequential rule mining
  • sequence prediction
  • periodic pattern mining
  • episode mining

Validation

A key design principle of pypatternminer is implementation reliability.

Each Python implementation is validated against the corresponding Java implementation in SPMF using:

  • the same input datasets,
  • the same parameter settings,
  • and the same experimental conditions.

The returned pattern sets and associated values, such as support or utility, are compared to ensure matching outputs.

Installation

Install from PyPI

pip install pypatternminer

Install from source

Clone the repository and install the package locally:

git clone https://github.com/taiduydinh/pypatternminer.git
cd pypatternminer
pip install .

If you want to install in editable mode for development:

pip install -e .

Quick Start

Algorithms are currently organized by module. A typical usage pattern is to import the algorithm class from its corresponding module.

Example: Apriori

from pypatternminer.apriori import AlgoApriori

algo = AlgoApriori()
algo.runAlgorithm(
    minsup=0.5,
    input_path="contextPasquier99.txt",
    output_path="output_py.txt"
)

Example: LCIM

from pypatternminer.lcim import AlgoLCIM

algo = AlgoLCIM()
algo.runAlgorithm(
    input_file="DB_cost.txt",
    output_file="output_py.txt",
    minUtility=28.0,
    maxcost=10.0,
    minsupp=0.3
)

Package Organization

The project is organized as follows:

pypatternminer/
├── .github/                 # GitHub Actions workflows
├── datasets/                # testing datasets
├── pypatternminer/          # source code
│   ├── __init__.py
│   ├── apriori.py
│   ├── aprioriclose.py
│   ├── aprioriinverse.py
│   ├── ...
│   └── lcim.py
├── README.md
└── pyproject.toml

Repository Structure

The repository is designed to support both algorithm development and reproducible experimentation:

  • pypatternminer/ contains the Python implementations of the algorithms;
  • datasets/ contains testing datasets used for validation and experiments;
  • .github/workflows/ contains the release and publishing workflows.

Usage Notes

  • Algorithms are currently accessed through their corresponding modules, for example:
    • from pypatternminer.apriori import AlgoApriori
    • from pypatternminer.lcim import AlgoLCIM
  • Input files, parameters, and output formats may differ across algorithms depending on their original design and reference implementation.

Contributing

Contributions are welcome.

You can contribute by:

  • implementing additional pattern-mining algorithms;
  • improving documentation and examples;
  • reporting bugs or testing issues;
  • adding benchmark datasets and validation cases;
  • improving code quality and reproducibility.

If you would like to contribute, please open an issue or submit a pull request.

Citation

If you use pypatternminer in your research, please cite the repository or the related paper if available.

A formal citation entry can be added here in future releases.

Vision

pypatternminer is intended to serve as both a practical software library and a research infrastructure for the pattern-mining community. By combining algorithmic breadth, reproducibility, and public accessibility, the project aims to support researchers, educators, and practitioners working on pattern mining and related areas.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypatternminer-0.0.3.tar.gz (310.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypatternminer-0.0.3-py3-none-any.whl (399.0 kB view details)

Uploaded Python 3

File details

Details for the file pypatternminer-0.0.3.tar.gz.

File metadata

  • Download URL: pypatternminer-0.0.3.tar.gz
  • Upload date:
  • Size: 310.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pypatternminer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c44df794f52b8a7c1973f0c0947effb80f7aa98dddf047e02642ca380be3721d
MD5 37f2f3b8461559a0e51bf11f873f619e
BLAKE2b-256 2ef1d0bb416d49b77cc86e08de64411a86feac992ed4743e369db33bb14e0dc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypatternminer-0.0.3.tar.gz:

Publisher: publish.yml on taiduydinh/pypatternminer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypatternminer-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pypatternminer-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 399.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pypatternminer-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 36c59d4b2591202ff464bd67f395327b2dcd3c5a400418ac74fbbcf9033b09fa
MD5 d6ef4f90421c0fc12a4e1dfe9d137e19
BLAKE2b-256 e1fa75c742165edc624bde29efed29a303e516d817f3efa607bb036bb451d928

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypatternminer-0.0.3-py3-none-any.whl:

Publisher: publish.yml on taiduydinh/pypatternminer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page