Skip to main content

A Fully Automated Risk & Trading Intelligence Engine

Project description

๐Ÿš€ AutoRiskML - The First Fully Automated Risk & Trading Intelligence Engine

PyPI version Python 3.8+ License: MIT

The only Python package that acts like a Senior Risk Data Scientist

AutoRiskML automates the entire risk modeling pipeline from data ingestion to Azure deployment. Built for banks, fintechs, trading firms, and hedge funds.

๐ŸŽฏ Why AutoRiskML is Revolutionary

โŒ The Problem

Risk data scientists spend 80% of their time on:

  • Manual data cleaning and binning
  • Computing WOE/IV tables
  • Monitoring PSI and drift
  • Building scorecards
  • Setting up model monitoring
  • Creating deployment pipelines

โœ… The Solution: AutoRiskML

from autoriskml import AutoRisk

# ONE command does EVERYTHING a senior risk DS would do:
ar = AutoRisk(project="loan_scoring")
ar.register_source("train", csv="data/loans.csv")
result = ar.run(
    source="train",
    target="default_flag",
    explain=True,
    deploy={"provider": "azure_ml"}
)

# You now have:
# โœ… Data profile & recommendations
# โœ… Automated cleaning
# โœ… Optimal binning & WOE/IV tables
# โœ… Trained scorecard model
# โœ… PSI & drift monitoring
# โœ… SHAP explainability
# โœ… Production-ready Azure deployment
# โœ… PDF/HTML reports

๐Ÿ† Unique Features (No Other Package Has These)

Feature Pandas Scikit-learn H2O PyCaret AutoRiskML
Auto WOE/IV โŒ โŒ โŒ โŒ โœ…
Auto PSI โŒ โŒ โŒ โŒ โœ…
Scorecard Generation โŒ โŒ โŒ โŒ โœ…
Drift Detection for Trading โŒ โŒ โŒ โŒ โœ…
Risk-specific Binning โŒ โŒ Partial โŒ โœ…
Azure ML Auto-deploy โŒ โŒ โŒ โŒ โœ…
Built-in Monitoring โŒ โŒ โŒ โŒ โœ…
Audit Trail โŒ โŒ โŒ โŒ โœ…
Pure Python โœ… โœ… โŒ โœ… โœ…

๐Ÿ”ฅ What AutoRiskML Does

A. Automated Risk ML Pipeline

# 1. DATA PROFILING - Like a senior DS would analyze
ar.profile()
# โ†’ Column types, missing %, distributions, recommendations

# 2. AUTO-CLEANING - Handles all edge cases
ar.autoclean()
# โ†’ Missing values, outliers, type coercion, date parsing

# 3. FEATURE ENGINEERING - Risk-specific features
ar.auto_features()
# โ†’ Binning, WOE encoding, interaction features

# 4. MODEL TRAINING - Multiple algorithms
ar.train(models=["logistic", "xgboost", "lightgbm"])
# โ†’ Auto hyperparameter tuning, walk-forward validation

# 5. SCORECARD GENERATION - Convert to points
ar.scorecard(pdo=20, base_score=600)
# โ†’ Industry-standard credit scoring

B. Risk Scoring Engine (WOE/IV/PSI)

# Weight of Evidence & Information Value
woe_iv = ar.compute_woe_iv(feature="credit_utilization", target="default")
print(f"IV: {woe_iv['iv']:.3f}")  # Predictive power
print(woe_iv['woe_table'])         # Bin-level WOE

# Population Stability Index
psi = ar.compute_psi(
    baseline_data="train.csv",
    current_data="production_data.csv"
)
print(f"PSI: {psi:.3f}")  # <0.1: stable, >0.25: significant drift

# Characteristic Stability Index
csi = ar.compute_csi(feature="income", current_data="latest.csv")

C. Monitoring & Drift Detection

# Continuous monitoring
monitor = ar.monitor(
    production_data="s3://bucket/prod_scores.parquet",
    baseline="train.csv",
    alert_threshold=0.2
)

print(monitor.summary())
# โ†’ PSI per feature
# โ†’ Score distribution shift
# โ†’ Prediction drift
# โ†’ Retrain recommendations

D. Explainability (SHAP + Custom)

# Global explainability
ar.explain_global()
# โ†’ Top features driving risk
# โ†’ SHAP summary plots

# Local explainability (per-record)
explanation = ar.explain_record(customer_id=12345)
print(explanation.reason_codes)
# โ†’ "High credit utilization (+45 pts)"
# โ†’ "Recent late payments (+30 pts)"

E. Deployment to Azure

# One-command deployment
endpoint = ar.deploy(
    provider="azure_ml",
    workspace="RiskWS",
    resource_group="risk-rg",
    compute_type="aks",  # or "aci" for quick tests
    instance_count=3
)

print(f"Endpoint: {endpoint.scoring_uri}")
print(f"Key: {endpoint.primary_key}")

# Score new data via REST API
scores = endpoint.score(new_customers_df)

F. Backtesting (Trading Mode)

# Time-series walk-forward validation
backtest = ar.backtest(
    data="trading_signals.csv",
    strategy="long_short",
    walk_forward_windows=12,
    refit_frequency="monthly"
)

print(backtest.sharpe_ratio)
print(backtest.max_drawdown)
print(backtest.cumulative_returns)

G. Auto-Reporting

# Generate comprehensive reports
ar.report(
    output="risk_report.html",
    include=[
        "data_profile",
        "woe_iv_tables",
        "model_performance",
        "psi_monitoring",
        "shap_explanations",
        "scorecard",
        "recommendations"
    ]
)

# PDF for regulators
ar.report(output="regulatory_report.pdf", template="basel")

๐Ÿ“ฆ Installation

Basic (Pure Python, zero dependencies)

pip install autoriskml

With Machine Learning

pip install autoriskml[ml]

With Explainability

pip install autoriskml[explain]

With Azure Deployment

pip install autoriskml[azure]

Full Installation (Everything)

pip install autoriskml[all]

๐Ÿš€ Quick Start (30 Seconds)

Example 1: Credit Scoring

from autoriskml import AutoRisk

# Initialize
ar = AutoRisk(project="credit_scoring")

# Register data
ar.register_source("train", csv="loans_train.csv")
ar.register_source("test", csv="loans_test.csv")

# Run full pipeline
result = ar.run(
    source="train",
    validation_source="test",
    target="default_flag",
    config="configs/credit_config.yaml"
)

# Access artifacts
print(f"Model AUC: {result.metrics['auc']:.3f}")
print(f"Model PSI: {result.metrics['psi']:.3f}")
print(f"Scorecard: {result.scorecard_path}")
print(f"Report: {result.report_html}")

Example 2: Fraud Detection

ar = AutoRisk(project="fraud_detection")
ar.register_source("transactions", sql_query="""
    SELECT * FROM transactions 
    WHERE date >= '2024-01-01'
""", connection_string="postgresql://...")

result = ar.run(
    source="transactions",
    target="is_fraud",
    models=["logistic", "xgboost"],
    explain=True,
    monitor={"psi_threshold": 0.15}
)

Example 3: Trading Risk

ar = AutoRisk(project="trading_risk", mode="trading")
ar.register_source("signals", parquet="s3://bucket/signals.parquet")

result = ar.run(
    source="signals",
    target="return_next_day",
    backtest=True,
    walk_forward=True,
    deploy={"provider": "azure_ml"}
)

print(f"Sharpe Ratio: {result.backtest['sharpe']:.2f}")
print(f"Max Drawdown: {result.backtest['max_dd']:.2%}")

๐ŸŽ“ Complete Example: End-to-End Loan Scoring

from autoriskml import AutoRisk
import pandas as pd

# 1. Initialize project
ar = AutoRisk(
    project="personal_loans",
    output_dir="artifacts/loans",
    log_level="INFO"
)

# 2. Register data sources
ar.register_source("train", csv="data/loans_2022_2023.csv")
ar.register_source("valid", csv="data/loans_2024_Q1.csv")
ar.register_source("prod", s3="s3://bucket/prod/loans.parquet")

# 3. Profile data (optional but recommended)
profile = ar.profile(source="train")
print(profile.summary())
# โ†’ 50,000 rows ร— 45 features
# โ†’ Missing: income (5%), employment_length (12%)
# โ†’ Recommendations: 8 features to drop, 3 to engineer

# 4. Run full automated pipeline
result = ar.run(
    source="train",
    validation_source="valid",
    target="default_flag",
    
    # Cleaning options
    clean={
        "missing_strategy": "auto",  # smart imputation
        "outlier_method": "iqr",
        "date_formats": ["%Y-%m-%d", "%d/%m/%Y"]
    },
    
    # Binning options
    binning={
        "numeric_method": "monotonic",  # monotonic bad rate
        "max_bins": 6,
        "min_bin_size": 0.05
    },
    
    # Feature selection
    features={
        "min_iv": 0.02,  # minimum information value
        "max_features": 20,
        "auto_interactions": True
    },
    
    # Model options
    models=[
        {"type": "logistic", "penalty": 0.1},
        {"type": "xgboost", "params": {"max_depth": 6, "eta": 0.05}}
    ],
    
    # Scorecard conversion
    scorecard={
        "pdo": 20,        # points to double odds
        "base_score": 600,
        "base_odds": 50
    },
    
    # Explainability
    explain=True,
    
    # Monitoring
    monitor={
        "compute_psi": True,
        "psi_threshold": 0.2,
        "drift_features": "auto",
        "retrain_trigger": "drift_or_performance"
    },
    
    # Reporting
    report={
        "formats": ["html", "pdf"],
        "template": "executive"
    },
    
    # Deployment
    deploy={
        "provider": "azure_ml",
        "workspace": "RiskWS",
        "resource_group": "risk-prod-rg",
        "compute": "aks-cluster",
        "auth": "key"
    }
)

# 5. Access results
print("\n" + "="*70)
print("๐Ÿ“Š RESULTS")
print("="*70)
print(f"โœ… Model: {result.best_model}")
print(f"โœ… AUC: {result.metrics['auc']:.3f}")
print(f"โœ… KS: {result.metrics['ks']:.3f}")
print(f"โœ… Gini: {result.metrics['gini']:.3f}")
print(f"โœ… PSI (validation): {result.metrics['psi']:.3f}")
print(f"\n๐Ÿ“ Artifacts:")
print(f"   โ€ข Model: {result.model_path}")
print(f"   โ€ข Scorecard: {result.scorecard_path}")
print(f"   โ€ข Binning spec: {result.binning_spec_path}")
print(f"   โ€ข WOE tables: {result.woe_tables_path}")
print(f"   โ€ข Report: {result.report_html}")
print(f"\n๐ŸŒ Deployment:")
print(f"   โ€ข Endpoint: {result.endpoint.scoring_uri}")
print(f"   โ€ข Key: {result.endpoint.primary_key[:20]}...")

# 6. Score new customers
new_customers = pd.read_csv("data/new_applications.csv")
scores = ar.score(new_customers, output="with_reasons")

print(f"\nโœ… Scored {len(scores)} new customers")
print(scores[['customer_id', 'score', 'probability', 'risk_tier', 'top_reason']].head())

# 7. Monitor production data
monitor_result = ar.monitor(source="prod")
if monitor_result.alert:
    print(f"\nโš ๏ธ  ALERT: {monitor_result.message}")
    print(f"   PSI: {monitor_result.psi:.3f} (threshold: 0.20)")
    print(f"   Drifted features: {', '.join(monitor_result.drifted_features)}")
    print(f"   Recommendation: {monitor_result.recommendation}")

๐Ÿ“š Advanced Features

Custom Binning Strategy

from autoriskml.binning import CustomBinner

class MyBinner(CustomBinner):
    def fit(self, values, target):
        # Your custom binning logic
        bins = self.compute_custom_bins(values, target)
        return bins

ar.register_binner("my_method", MyBinner())
result = ar.run(..., binning={"method": "my_method"})

Custom Model Adapter

from autoriskml.models import ModelAdapter

class MyModelAdapter(ModelAdapter):
    def train(self, X, y):
        # Train your model
        self.model = YourModel().fit(X, y)
    
    def predict_proba(self, X):
        return self.model.predict_proba(X)

ar.register_model("my_model", MyModelAdapter())

Streaming Scoring

# Score large datasets in chunks
for chunk_scores in ar.score_stream(
    source="s3://bucket/huge_file.csv",
    chunk_size=100_000,
    output="s3://bucket/scores/"
):
    print(f"Scored {len(chunk_scores)} records")

Real-time Monitoring

# Set up continuous monitoring
ar.monitor_continuously(
    source_stream="kafka://topic/transactions",
    baseline="train_data.csv",
    check_interval="hourly",
    alert_email="risk-team@company.com"
)

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        AutoRisk API                              โ”‚
โ”‚  (Simple high-level interface: run(), score(), monitor())       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Core Pipeline Orchestrator                    โ”‚
โ”‚  โ€ข Stage execution  โ€ข Artifact management  โ€ข Provenance tracking โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚       โ”‚        โ”‚         โ”‚          โ”‚           โ”‚
     โ–ผ       โ–ผ        โ–ผ         โ–ผ          โ–ผ           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Connectorโ”‚Profilโ”‚Cleaningโ”‚ Binning โ”‚  Models    โ”‚   Scoring    โ”‚
โ”‚CSV/SQL/ โ”‚-ing  โ”‚Auto    โ”‚WOE/IV   โ”‚Logistic/   โ”‚  Scorecard   โ”‚
โ”‚S3/Kafka โ”‚      โ”‚Clean   โ”‚Monotonicโ”‚XGB/LightGBMโ”‚  Generation  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                      โ”‚
                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                 โ–ผ                    โ–ผ                     โ–ผ
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚ Metrics โ”‚         โ”‚ Explain  โ”‚        โ”‚ Monitoring โ”‚
            โ”‚PSI/CSI/ โ”‚         โ”‚SHAP/LIME โ”‚        โ”‚Drift/Alert โ”‚
            โ”‚KS/Gini  โ”‚         โ”‚Reasons   โ”‚        โ”‚PSI Tracker โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                      โ”‚
                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                 โ–ผ                                          โ–ผ
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚ Export  โ”‚                               โ”‚Deploymentโ”‚
            โ”‚ONNX/    โ”‚                               โ”‚Azure ML/ โ”‚
            โ”‚Joblib   โ”‚                               โ”‚AKS/API   โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽฏ Use Cases

1. Banks & Credit Unions

  • Personal loan scoring
  • Credit card approvals
  • Mortgage risk assessment
  • SME lending

2. Fintechs

  • BNPL (Buy Now Pay Later) scoring
  • Micro-lending
  • Alternative credit scoring
  • KYC risk assessment

3. Insurance

  • Claims fraud detection
  • Underwriting risk
  • Policyholder lifetime value

4. Trading Firms

  • Strategy risk monitoring
  • Position sizing
  • Counterparty risk
  • Market regime detection

5. E-commerce

  • Transaction fraud
  • Account takeover detection
  • Chargeback prediction

๐Ÿ“Š Performance

  • Speed: 10x faster than manual process
  • Accuracy: Comparable to senior DS work
  • Scalability: Handles 100M+ records with distributed mode
  • Memory: Streaming support for datasets > RAM

๐Ÿ”’ Security & Compliance

  • โœ… Local-first (no external calls by default)
  • โœ… Audit trail for all transformations
  • โœ… PII detection and scrubbing
  • โœ… Explainable AI for regulatory compliance
  • โœ… Reproducible pipelines (version control)
  • โœ… GDPR-compliant data handling

๐Ÿ“– Documentation

๐Ÿค Contributing

Contributions are welcome! See CONTRIBUTING.md

๐Ÿ“ License

MIT License - see LICENSE file

๐Ÿ™ Acknowledgments

Built with inspiration from years of risk modeling in banking and fintech.

๐Ÿ“ง Contact


โญ If AutoRiskML helps you, please star the repo!

๐Ÿš€ Built for the future of automated risk intelligence

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoriskml-0.1.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoriskml-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file autoriskml-0.1.0.tar.gz.

File metadata

  • Download URL: autoriskml-0.1.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for autoriskml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22a51286dbf7f400c553b8ad5a60c10286aa33f5d3e268069773e63328ed3dba
MD5 f69e9d4a0b7c1d0c2181c9bde71a6565
BLAKE2b-256 9380a11edac558e8ca4a64f566cb0f840ab2b1889fd520a67582e931337e63b2

See more details on using hashes here.

File details

Details for the file autoriskml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: autoriskml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for autoriskml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94883cbcad693693f3ca5719cfd9eebe556650f1d14bec9f7e68c8bec49bd8d8
MD5 99d4b59014b5c76c7c8b9b3e4e4fd1d7
BLAKE2b-256 43e25e1f0271332a055eaa989932d9adb72f4d985cdf3cdf80f95a5b874dfc80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page