Skip to main content

Lightning fast rule generation library

Project description

Iguanas Logo

Iguanas: A Lightning-Fast Rule Generation Python Library

Package PyPI version Python versions
Quality License Coverage
Documentation Documentation
Code style Ruff
Downloads Downloads Downloads/Month
Community GitHub Stars GitHub Forks Contributors Last Commit

📚 Full Documentation

What is Iguanas?

Iguanas is a library built on top of Polars, designed to streamline the entire rule-based system development workflow — from raw data to production-ready rules — leveraging Polars' blazing-fast multi-core processing.

Built by the PSP Data Team at PayPal, Iguanas makes rule generation, evaluation, and selection both faster and simpler.

⚡ Key Features

  • 🚀 Lightning Fast: Built on Polars for multi-core parallel processing
  • 🎯 End-to-End: Generate, evaluate, combine, and select rules in one library
  • 📦 Production Ready: Lightweight rule strings that deploy anywhere
  • 🔧 Flexible: Sequential and parallel grid search strategies
  • 🔗 Composable: Chain generation → evaluation → selection with a few function calls
  • 🎓 Easy to Learn: Simple functional API with clear, consistent signatures

🛠️ What Can Iguanas Do?

⚙️ Rule Generation

Generate interpretable rules from labelled datasets using XGBoost tree extraction:

  • rule_grid_search_sequential - Single-process grid search over weight transformations and scale_pos_weight values
  • rule_grid_search_parallel_weights - Parallel grid search parallelised over weight transformations
  • rule_grid_search_parallel_scales - Parallel grid search parallelised over scale_pos_weight values
  • extract_rules - Extract rules from a fitted XGBoost model (with optional monotone constraints)
  • extract_rule_by_max_gain - Extract the highest-gain rule path from a single tree
  • extract_rule_with_monotone_constraints - Extract a rule path respecting monotone constraints

📊 Metrics

Compute classification performance metrics for rule predictions:

  • compute_metrics - Compute a full metrics table (accuracy, precision, recall, F-beta, TP/FP/TN/FN, flagged %) for a set of rules
  • compute_single_metric - Compute a single scalar metric (accuracy, precision, recall or F-beta) — optimised for hot-path evaluation

🔍 Rule Evaluation

Evaluate rules on data and filter by performance:

  • apply_rules - Evaluate rule expressions on a DataFrame and return a boolean prediction matrix
  • apply_and_filter_by_performance - Evaluate rules and filter by user-defined metric thresholds
  • select_diverse_top_rules - Select top-performing rules while removing highly correlated duplicates
  • apply_filter_and_deduplicate_rules - Complete end-to-end pipeline: evaluate → filter → deduplicate

🔀 Rule Combination

Combine individual rules into compound rules to improve performance:

  • combine_rules_full_search - Exhaustive search over all rule pairs
  • combine_rules_cumulative - Incrementally combine rules with a running candidate
  • combine_rules_greedy - Greedy combination selecting the best pair at each step
  • combine_rules_beam_search - Beam search combination balancing quality and efficiency
  • combine_rules_a_star - A* search combination using a heuristic cost function

✂️ Rule Selection

Deduplicate and prune rule sets:

  • filter_rules_by_feature_overlap - Remove rules that share too many features with higher-importance rules
  • filter_correlated_rules - Remove rules whose predictions are highly correlated
  • select_best_rule_per_column_combination - Keep only the best-performing rule for each unique column combination
  • extract_feature_names_from_rule - Parse a rule string and return the feature names it references

🔬 Rule Analysis

Inspect and report on rule sets:

  • generate_rule_performance_report - Generate a combined performance and structure report for a rule set
  • parse_conditions - Parse a rule expression into its constituent conditions
  • parse_levels - Parse a rule expression into a structured level-by-level representation
  • rebuild_from_levels - Reconstruct a rule string from a level representation

🖊️ Rule Formatting

Clean up rule expressions for display or logging:

  • simplify_rule - Simplify a rule expression by removing redundant conditions

📐 Monotone Constraints

Infer feature directionality to guide rule generation:

  • infer_monotone_constraints_from_correlations - Infer monotone constraints (±1) from feature–target correlations
  • infer_monotone_constraints_from_stumps - Infer monotone constraints (±1) from decision stumps

⚖️ Sample Weight Transformations

Generate sample weight schedules to steer rule learning:

  • generate_increasing_weights - Weights that increase with feature value (power, log families)
  • generate_decreasing_weights - Weights that decrease with feature value (reciprocal families)
  • generate_weights - Generate both increasing and decreasing weight schedules in one call

🚀 Quick Start

import polars as pl
import numpy as np
from xgboost import XGBClassifier

from iguanas.weight_transformations import generate_weights
from iguanas.rule_generation import rule_grid_search_parallel_weights
from iguanas.rule_evaluation import apply_filter_and_deduplicate_rules

# 1. Load your data
X_train = pl.DataFrame({
    "age":    [25, 45, 35, 50, 30, 55, 40, 28],
    "income": [30000, 80000, 50000, 90000, 40000, 95000, 70000, 35000],
})
y_train = pl.Series([0, 1, 0, 1, 0, 1, 1, 0])

# 2. Generate sample weight transformations
weights = generate_weights(X_train["income"])

# 3. Run a parallel grid search to extract rules
estimator = XGBClassifier(max_depth=2, n_estimators=5, random_state=42)
scale_pos_weights = np.logspace(0, 1, 5)

rules_df = rule_grid_search_parallel_weights(
    estimator, X_train, y_train,
    scale_pos_weights=scale_pos_weights,
    weights_train_vec=weights,
    n_jobs=-1,
)

# 4. Evaluate, filter, and deduplicate rules
R, metrics, selected_rules = apply_filter_and_deduplicate_rules(
    X_train, y_train, rules_df,
    metric_thresholds=[
        {"name": "precision", "operator": ">=", "value": 0.6},
        {"name": "recall",    "operator": ">=", "value": 0.5},
    ],
    max_corr=0.8,
)

print(selected_rules)

📦 Installation

Requires Python 3.10 or higher.

pip install iguanas

Or install from source:

git clone https://github.com/paypal/iguanas.git
cd iguanas
pip install -e .    # Install in editable/development mode

📚 Documentation

For detailed documentation, tutorials, and API reference, visit:

https://paypal.github.io/iguanas/

🎯 Use Cases

Iguanas is perfect for:

  • Fraud Detection - Generate high-precision rules to flag suspicious transactions
  • Risk Scoring - Build interpretable rule sets for credit or operational risk
  • Compliance & Policy - Encode business policies as auditable rule expressions
  • Anomaly Detection - Surface rare but meaningful patterns in labelled data
  • Model Explainability - Extract human-readable rules from gradient boosted models

🏢 Used By

Iguanas powers rule-based systems at:

  • PayPal (internal use)

🤝 Contributing

We welcome contributions! Please check out our contributing guidelines.

📄 License

Iguanas is licensed under the Apache License 2.0. See LICENSE file for details.

🙏 Credits

Developed by the PSP Data Team at PayPal.


Built by data scientists, for data scientists

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iguanas-1.0.2-py3-none-any.whl (87.8 kB view details)

Uploaded Python 3

File details

Details for the file iguanas-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: iguanas-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 87.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for iguanas-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1a39e0aa75572f46b38be1afe16ef194830704e8e37596ea25745d860aa8a627
MD5 d1c3c4c1f6fc6de1afe27939929dfaa5
BLAKE2b-256 39b90a76609ab7e273160bc2164712d0c9f012dc39cb58508aefca5fd794cc9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page