Skip to main content

An implementation of the Aerial neurosymbolic association rule mining algorithm from tabular datasets.

Project description

pyaerial: scalable association rule mining


Python Versions PyPI Version Build Status License GitHub Stars Last commit Tested on Ubuntu 24.04 LTS Tested on MacOS Monterey 12.6.7

📥 Install | 🚀 Quick Start | ✨ Features | 📚 Documentation | 📄 Cite | 🤝 Contribute | 🔑 License

PyAerial is a Python implementation of the Aerial scalable neurosymbolic association rule miner for tabular data. It utilizes an under-complete denoising Autoencoder to learn a compact representation of tabular data, and extracts a concise set of high-quality association rules with full data coverage.

Unlike traditional exhaustive methods (e.g., Apriori, FP-Growth), Aerial addresses the rule explosion problem by learning neural representations and extracting only the most relevant patterns, making it suitable for large-scale datasets. PyAerial supports GPU acceleration, numerical data discretization, item constraints, and * classification rule extraction* and rule visualization via NiaARM library (see Features for complete list).

Learn more about the architecture, training, and rule extraction in our paper: Neurosymbolic Association Rule Mining from Tabular Data


Installation

Install PyAerial using pip:

pip install pyaerial

Quick Start

from aerial import model, rule_extraction, rule_quality
from ucimlrepo import fetch_ucirepo

# Load a categorical tabular dataset
breast_cancer = fetch_ucirepo(id=14).data.features

# Train an autoencoder on the loaded table
trained_autoencoder = model.train(breast_cancer)

# Extract association rules from the autoencoder
association_rules = rule_extraction.generate_rules(trained_autoencoder)

# Calculate rule quality statistics
if len(association_rules) > 0:
    stats, association_rules = rule_quality.calculate_rule_stats(
        association_rules,
        trained_autoencoder.input_vectors
    )
    print(f"Extracted {stats['rule_count']} rules\n")

    # Display a sample rule
    sample_rule = association_rules[0]
    print(f"Sample Rule: {sample_rule}")

Output:

Extracted
15
rules

Sample
Rule: {
    "antecedents": ["inv-nodes__0-2"],
    "consequent": "node-caps__no",
    "support": 0.702,
    "confidence": 0.943,
    "zhangs_metric": 0.69
}

Interpretation: This rule indicates that when the inv-nodes feature has a value between 0-2, there is a strong likelihood (94.3% confidence) that node-caps equals no. The rule covers 70.2% of the dataset.

Quality metrics explained:

  • Support: Frequency of the rule in the dataset (how often the pattern occurs)
  • Confidence: How often the consequent is true when antecedent is true (rule reliability)
  • Zhang's Metric: Correlation measure between antecedent and consequent (-1 to 1; positive values indicate positive correlation)

Overall statistics across all 15 rules:

{
    "rule_count": 15,
    "average_support": 0.448,
    "average_confidence": 0.881,
    "average_coverage": 0.860,
    "average_zhangs_metric": 0.318
}

Can't get results you looked for? See Debugging in our documentation.


Features

PyAerial provides a comprehensive toolkit for association rule mining with advanced capabilities:

  • Scalable Rule Mining - Efficiently mine association rules from large tabular datasets without rule explosion
  • Frequent Itemset Mining - Generate frequent itemsets using the same neural approach
  • ARM with Item Constraints - Focus rule mining on specific features of interest
  • Classification Rules - Extract rules with target class labels for interpretable inference
  • Numerical Data Support - Built-in discretization methods (equal-frequency, equal-width)
  • Customizable Architectures - Fine-tune autoencoder layers and dimensions for optimal performance
  • GPU Acceleration - Leverage CUDA for faster training on large datasets
  • Quality Metrics - Comprehensive rule evaluation (support, confidence, coverage, Zhang's metric)
  • Rule Visualization - Integrate with NiaARM for scatter plots and visual analysis
  • Flexible Training - Adjust epochs, learning rate, batch size, and noise factors

How Aerial Works?

Aerial employs a three-stage neurosymbolic pipeline to extract high-quality association rules from tabular data:

1. Data Preparation

Categorical data is one-hot encoded while tracking feature relationships. Numerical columns require pre-discretization ( equal-frequency or equal-width methods available). The encoded values are transformed into vector format for neural processing.

2. Autoencoder Training

An under-complete denoising autoencoder learns a compact representation of the data:

  • Architecture: Logarithmic reduction (base 16) automatically configures layers, or use custom dimensions
  • Bottleneck design: The encoder compresses input to the original feature count, forcing the network to learn meaningful associations
  • Denoising mechanism: Random noise during training improves robustness and generalization
Rule extraction example

Example: Rule extraction process using weather and beverage features

3. Rule Extraction

Rules emerge from analyzing the trained autoencoder using test vectors:

  1. Test vectors are created with equal probabilities across categories
  2. Specific features are set to 1 (antecedents) while others remain at baseline
  3. Forward passes through the network produce output probabilities
  4. Rules are extracted when probabilities exceed similarity thresholds
  5. Quality metrics (support, confidence, coverage, Zhang's metric) are calculated
Aerial pipeline

Complete three-stage pipeline: data preparation → training → rule extraction

Learn more: For detailed explanations of the architecture, theoretical foundations, and experimental results, see our paper: Neurosymbolic Association Rule Mining from Tabular Data


Documentation

For detailed usage examples, API reference, and advanced topics, visit our comprehensive documentation:

📚 Read the full documentation on ReadTheDocs

Documentation includes:

  • Getting Started - Installation and basic usage
  • User Guide - Detailed examples for all features
  • API Reference - Complete function and class documentation
  • Advanced Topics - GPU usage, debugging, visualization
  • How It Works - Understanding Aerial's architecture and algorithm

Citation

If you use PyAerial in your work, please cite our research and software papers:

@InProceedings{pmlr-v284-karabulut25a,
  title         = {Neurosymbolic Association Rule Mining from Tabular Data},
  author        = {Karabulut, Erkan and Groth, Paul and Degeler, Victoria},
  booktitle     = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning},
  pages         = {565--588},
  year          = {2025},
  editor        = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile},
  volume        = {284},
  series        = {Proceedings of Machine Learning Research},
  month         = {08--10 Sep},
  publisher     = {PMLR},
  url           = {https://proceedings.mlr.press/v284/karabulut25a.html}
}

@article{pyaerial,
  title         = {PyAerial: Scalable association rule mining from tabular data},
  journal       = {SoftwareX},
  volume        = {31},
  pages         = {102341},
  year          = {2025},
  issn          = {2352-7110},
  doi           = {https://doi.org/10.1016/j.softx.2025.102341},
  author        = {Erkan Karabulut and Paul Groth and Victoria Degeler},
}

Contact

For questions, suggestions, or collaborations, please contact:

Erkan Karabulut 📧 e.karabulut@uva.nl 📧 erkankkarabulut@gmail.com


Contribute

We welcome contributions from the community! Whether you're fixing bugs, adding new features, improving documentation, or sharing feedback, your help is appreciated.

How to contribute:

  • 🐛 Report bugs - Open an issue describing the problem
  • 💡 Suggest features - Share your ideas for improvements
  • 📝 Improve docs - Help us make the documentation clearer
  • 🔧 Submit PRs - Fork the repo and create a pull request
  • 💬 Share feedback - Contact us with your experience using PyAerial

Feel free to open an issue or pull request on GitHub, or reach out directly!


License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyaerial-1.0.5.tar.gz (15.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyaerial-1.0.5-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file pyaerial-1.0.5.tar.gz.

File metadata

  • Download URL: pyaerial-1.0.5.tar.gz
  • Upload date:
  • Size: 15.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyaerial-1.0.5.tar.gz
Algorithm Hash digest
SHA256 e5b5b8b2333bf52812003d50348ce51345d02e20fcb42493a79a81e04dda9b1c
MD5 0df483146780b843d175b42e661c6864
BLAKE2b-256 ca238f6a1d02ca756c208febbc897557b8789dcb155695ad5a64fc75d0a0ee9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaerial-1.0.5.tar.gz:

Publisher: release.yml on DiTEC-project/pyaerial

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyaerial-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: pyaerial-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyaerial-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6a50d40146111302d2692dd1779c9c236e994986f55785d704dc3e7d635f2ffa
MD5 7b9f4288eefe230f18cd92f595d14b95
BLAKE2b-256 8edb5b4fc53eee39b3be42b00f424c86cf988ab672293d49e73bd07f474d67fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaerial-1.0.5-py3-none-any.whl:

Publisher: release.yml on DiTEC-project/pyaerial

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page