Skip to main content

Hierarchical Pattern-aided Regression

Project description

Powered by Inria

HIPAR (Hierarchical Interpretable Pattern-aided Regression)

HIPAR is a pattern-based method for regression on tabular data. Given a dataset, HIPAR outputs a set of hybrid rules of the form p => y = f(X) that predict a target variable y. Here, p is a conjunctive pattern that characterizes a region of the dataset (e.g., property-type='house' and surface > 50), and f(X) is a linear function on the numerical features of the dataset.

How to use HIPAR

HIPAR's code is still in alpha status, nevertheless the code can be used without major issues.

from hipar import HIPAR
from data import get_simple_housing

hipar = HIPAR(min_support=2, interclass_variance_percentile_threshold=0)
X, y = get_simple_housing()
hipar.fit(X, y)
## Get all rules found during the enumeration phase
print(hipar.all_rules)
## Get the rules selected by HiPaR (used for prediction)
print(hipar.get_selected_rules())
X_test = ...
print(hipar.predict(X_test))

Experimental Results

The first implementation of HiPaR including all the experimental evaluation and data is available here.

Roadmap

Diferences with the published version

  • Interclass variance threshold is calculated over the entire set of refinement conditions and not on the set of discretized refinement conditions
  • We do not check if a new rule is better than all its parents, but only better than the generating parent. This just sents more rules to the selection phase, but makes the code simpler (I am not confident of the previous implementation of this feature).

Improvements w.r.t. the published version

  • Support for multiple metrics in the enumeration phase. A new rule will be compared against its parent on all the metrics provided as input in the constructor

Roadmap

  • Make a Python installable package [Urgent]
  • Consider other quality criteria to prune during the enumeration such as the p-values of the linear coefficients.
  • If we need to compare against all the HIPAR-based hybrid methods published in the paper, we will have to reimplement them.
  • Consider alternative discretization approaches for the numerical variables in the conditions.

Publications

  • Luis Galárraga, Olivier Pelgrin, Alexandre Termier. HiPaR: Hierarchical Pattern-aided Regression. Full paper at the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2021), Delhi. [Technical Report]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

hipar-0.1.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file hipar-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hipar-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for hipar-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3652e4b0d5ce84c0ab008e5f0c26c0532364f8cebf0ce51a882f807b97ac02e0
MD5 dc808ab61019851e6754f67a2b29de3b
BLAKE2b-256 223ab1c95568935748af74aa86fb4cbbd312199bc3b106477376e8e56530f2a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page