A Fully Automated Risk & Trading Intelligence Engine
Project description
๐ AutoRiskML - The First Fully Automated Risk & Trading Intelligence Engine
The only Python package that acts like a Senior Risk Data Scientist
AutoRiskML automates the entire risk modeling pipeline from data ingestion to Azure deployment. Built for banks, fintechs, trading firms, and hedge funds.
๐ฏ Why AutoRiskML is Revolutionary
โ The Problem
Risk data scientists spend 80% of their time on:
- Manual data cleaning and binning
- Computing WOE/IV tables
- Monitoring PSI and drift
- Building scorecards
- Setting up model monitoring
- Creating deployment pipelines
โ The Solution: AutoRiskML
from autoriskml import AutoRisk
# ONE command does EVERYTHING a senior risk DS would do:
ar = AutoRisk(project="loan_scoring")
ar.register_source("train", csv="data/loans.csv")
result = ar.run(
source="train",
target="default_flag",
explain=True,
deploy={"provider": "azure_ml"}
)
# You now have:
# โ
Data profile & recommendations
# โ
Automated cleaning
# โ
Optimal binning & WOE/IV tables
# โ
Trained scorecard model
# โ
PSI & drift monitoring
# โ
SHAP explainability
# โ
Production-ready Azure deployment
# โ
PDF/HTML reports
๐ Unique Features (No Other Package Has These)
| Feature | Pandas | Scikit-learn | H2O | PyCaret | AutoRiskML |
|---|---|---|---|---|---|
| Auto WOE/IV | โ | โ | โ | โ | โ |
| Auto PSI | โ | โ | โ | โ | โ |
| Scorecard Generation | โ | โ | โ | โ | โ |
| Drift Detection for Trading | โ | โ | โ | โ | โ |
| Risk-specific Binning | โ | โ | Partial | โ | โ |
| Azure ML Auto-deploy | โ | โ | โ | โ | โ |
| Built-in Monitoring | โ | โ | โ | โ | โ |
| Audit Trail | โ | โ | โ | โ | โ |
| Pure Python | โ | โ | โ | โ | โ |
๐ฅ What AutoRiskML Does
A. Automated Risk ML Pipeline
# 1. DATA PROFILING - Like a senior DS would analyze
ar.profile()
# โ Column types, missing %, distributions, recommendations
# 2. AUTO-CLEANING - Handles all edge cases
ar.autoclean()
# โ Missing values, outliers, type coercion, date parsing
# 3. FEATURE ENGINEERING - Risk-specific features
ar.auto_features()
# โ Binning, WOE encoding, interaction features
# 4. MODEL TRAINING - Multiple algorithms
ar.train(models=["logistic", "xgboost", "lightgbm"])
# โ Auto hyperparameter tuning, walk-forward validation
# 5. SCORECARD GENERATION - Convert to points
ar.scorecard(pdo=20, base_score=600)
# โ Industry-standard credit scoring
B. Risk Scoring Engine (WOE/IV/PSI)
# Weight of Evidence & Information Value
woe_iv = ar.compute_woe_iv(feature="credit_utilization", target="default")
print(f"IV: {woe_iv['iv']:.3f}") # Predictive power
print(woe_iv['woe_table']) # Bin-level WOE
# Population Stability Index
psi = ar.compute_psi(
baseline_data="train.csv",
current_data="production_data.csv"
)
print(f"PSI: {psi:.3f}") # <0.1: stable, >0.25: significant drift
# Characteristic Stability Index
csi = ar.compute_csi(feature="income", current_data="latest.csv")
C. Monitoring & Drift Detection
# Continuous monitoring
monitor = ar.monitor(
production_data="s3://bucket/prod_scores.parquet",
baseline="train.csv",
alert_threshold=0.2
)
print(monitor.summary())
# โ PSI per feature
# โ Score distribution shift
# โ Prediction drift
# โ Retrain recommendations
D. Explainability (SHAP + Custom)
# Global explainability
ar.explain_global()
# โ Top features driving risk
# โ SHAP summary plots
# Local explainability (per-record)
explanation = ar.explain_record(customer_id=12345)
print(explanation.reason_codes)
# โ "High credit utilization (+45 pts)"
# โ "Recent late payments (+30 pts)"
E. Deployment to Azure
# One-command deployment
endpoint = ar.deploy(
provider="azure_ml",
workspace="RiskWS",
resource_group="risk-rg",
compute_type="aks", # or "aci" for quick tests
instance_count=3
)
print(f"Endpoint: {endpoint.scoring_uri}")
print(f"Key: {endpoint.primary_key}")
# Score new data via REST API
scores = endpoint.score(new_customers_df)
F. Backtesting (Trading Mode)
# Time-series walk-forward validation
backtest = ar.backtest(
data="trading_signals.csv",
strategy="long_short",
walk_forward_windows=12,
refit_frequency="monthly"
)
print(backtest.sharpe_ratio)
print(backtest.max_drawdown)
print(backtest.cumulative_returns)
G. Auto-Reporting
# Generate comprehensive reports
ar.report(
output="risk_report.html",
include=[
"data_profile",
"woe_iv_tables",
"model_performance",
"psi_monitoring",
"shap_explanations",
"scorecard",
"recommendations"
]
)
# PDF for regulators
ar.report(output="regulatory_report.pdf", template="basel")
๐ฆ Installation
Basic (Pure Python, zero dependencies)
pip install autoriskml
With Machine Learning
pip install autoriskml[ml]
With Explainability
pip install autoriskml[explain]
With Azure Deployment
pip install autoriskml[azure]
Full Installation (Everything)
pip install autoriskml[all]
๐ Quick Start (30 Seconds)
Example 1: Credit Scoring
from autoriskml import AutoRisk
# Initialize
ar = AutoRisk(project="credit_scoring")
# Register data
ar.register_source("train", csv="loans_train.csv")
ar.register_source("test", csv="loans_test.csv")
# Run full pipeline
result = ar.run(
source="train",
validation_source="test",
target="default_flag",
config="configs/credit_config.yaml"
)
# Access artifacts
print(f"Model AUC: {result.metrics['auc']:.3f}")
print(f"Model PSI: {result.metrics['psi']:.3f}")
print(f"Scorecard: {result.scorecard_path}")
print(f"Report: {result.report_html}")
Example 2: Fraud Detection
ar = AutoRisk(project="fraud_detection")
ar.register_source("transactions", sql_query="""
SELECT * FROM transactions
WHERE date >= '2024-01-01'
""", connection_string="postgresql://...")
result = ar.run(
source="transactions",
target="is_fraud",
models=["logistic", "xgboost"],
explain=True,
monitor={"psi_threshold": 0.15}
)
Example 3: Trading Risk
ar = AutoRisk(project="trading_risk", mode="trading")
ar.register_source("signals", parquet="s3://bucket/signals.parquet")
result = ar.run(
source="signals",
target="return_next_day",
backtest=True,
walk_forward=True,
deploy={"provider": "azure_ml"}
)
print(f"Sharpe Ratio: {result.backtest['sharpe']:.2f}")
print(f"Max Drawdown: {result.backtest['max_dd']:.2%}")
๐ Complete Example: End-to-End Loan Scoring
from autoriskml import AutoRisk
import pandas as pd
# 1. Initialize project
ar = AutoRisk(
project="personal_loans",
output_dir="artifacts/loans",
log_level="INFO"
)
# 2. Register data sources
ar.register_source("train", csv="data/loans_2022_2023.csv")
ar.register_source("valid", csv="data/loans_2024_Q1.csv")
ar.register_source("prod", s3="s3://bucket/prod/loans.parquet")
# 3. Profile data (optional but recommended)
profile = ar.profile(source="train")
print(profile.summary())
# โ 50,000 rows ร 45 features
# โ Missing: income (5%), employment_length (12%)
# โ Recommendations: 8 features to drop, 3 to engineer
# 4. Run full automated pipeline
result = ar.run(
source="train",
validation_source="valid",
target="default_flag",
# Cleaning options
clean={
"missing_strategy": "auto", # smart imputation
"outlier_method": "iqr",
"date_formats": ["%Y-%m-%d", "%d/%m/%Y"]
},
# Binning options
binning={
"numeric_method": "monotonic", # monotonic bad rate
"max_bins": 6,
"min_bin_size": 0.05
},
# Feature selection
features={
"min_iv": 0.02, # minimum information value
"max_features": 20,
"auto_interactions": True
},
# Model options
models=[
{"type": "logistic", "penalty": 0.1},
{"type": "xgboost", "params": {"max_depth": 6, "eta": 0.05}}
],
# Scorecard conversion
scorecard={
"pdo": 20, # points to double odds
"base_score": 600,
"base_odds": 50
},
# Explainability
explain=True,
# Monitoring
monitor={
"compute_psi": True,
"psi_threshold": 0.2,
"drift_features": "auto",
"retrain_trigger": "drift_or_performance"
},
# Reporting
report={
"formats": ["html", "pdf"],
"template": "executive"
},
# Deployment
deploy={
"provider": "azure_ml",
"workspace": "RiskWS",
"resource_group": "risk-prod-rg",
"compute": "aks-cluster",
"auth": "key"
}
)
# 5. Access results
print("\n" + "="*70)
print("๐ RESULTS")
print("="*70)
print(f"โ
Model: {result.best_model}")
print(f"โ
AUC: {result.metrics['auc']:.3f}")
print(f"โ
KS: {result.metrics['ks']:.3f}")
print(f"โ
Gini: {result.metrics['gini']:.3f}")
print(f"โ
PSI (validation): {result.metrics['psi']:.3f}")
print(f"\n๐ Artifacts:")
print(f" โข Model: {result.model_path}")
print(f" โข Scorecard: {result.scorecard_path}")
print(f" โข Binning spec: {result.binning_spec_path}")
print(f" โข WOE tables: {result.woe_tables_path}")
print(f" โข Report: {result.report_html}")
print(f"\n๐ Deployment:")
print(f" โข Endpoint: {result.endpoint.scoring_uri}")
print(f" โข Key: {result.endpoint.primary_key[:20]}...")
# 6. Score new customers
new_customers = pd.read_csv("data/new_applications.csv")
scores = ar.score(new_customers, output="with_reasons")
print(f"\nโ
Scored {len(scores)} new customers")
print(scores[['customer_id', 'score', 'probability', 'risk_tier', 'top_reason']].head())
# 7. Monitor production data
monitor_result = ar.monitor(source="prod")
if monitor_result.alert:
print(f"\nโ ๏ธ ALERT: {monitor_result.message}")
print(f" PSI: {monitor_result.psi:.3f} (threshold: 0.20)")
print(f" Drifted features: {', '.join(monitor_result.drifted_features)}")
print(f" Recommendation: {monitor_result.recommendation}")
๐ Advanced Features
Custom Binning Strategy
from autoriskml.binning import CustomBinner
class MyBinner(CustomBinner):
def fit(self, values, target):
# Your custom binning logic
bins = self.compute_custom_bins(values, target)
return bins
ar.register_binner("my_method", MyBinner())
result = ar.run(..., binning={"method": "my_method"})
Custom Model Adapter
from autoriskml.models import ModelAdapter
class MyModelAdapter(ModelAdapter):
def train(self, X, y):
# Train your model
self.model = YourModel().fit(X, y)
def predict_proba(self, X):
return self.model.predict_proba(X)
ar.register_model("my_model", MyModelAdapter())
Streaming Scoring
# Score large datasets in chunks
for chunk_scores in ar.score_stream(
source="s3://bucket/huge_file.csv",
chunk_size=100_000,
output="s3://bucket/scores/"
):
print(f"Scored {len(chunk_scores)} records")
Real-time Monitoring
# Set up continuous monitoring
ar.monitor_continuously(
source_stream="kafka://topic/transactions",
baseline="train_data.csv",
check_interval="hourly",
alert_email="risk-team@company.com"
)
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AutoRisk API โ
โ (Simple high-level interface: run(), score(), monitor()) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Core Pipeline Orchestrator โ
โ โข Stage execution โข Artifact management โข Provenance tracking โ
โโโโโโฌโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโ
โ โ โ โ โ โ
โผ โผ โผ โผ โผ โผ
โโโโโโโโโโโฌโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โConnectorโProfilโCleaningโ Binning โ Models โ Scoring โ
โCSV/SQL/ โ-ing โAuto โWOE/IV โLogistic/ โ Scorecard โ
โS3/Kafka โ โClean โMonotonicโXGB/LightGBMโ Generation โ
โโโโโโโโโโโดโโโโโโโดโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ Metrics โ โ Explain โ โ Monitoring โ
โPSI/CSI/ โ โSHAP/LIME โ โDrift/Alert โ
โKS/Gini โ โReasons โ โPSI Tracker โ
โโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโ โโโโโโโโโโโโ
โ Export โ โDeploymentโ
โONNX/ โ โAzure ML/ โ
โJoblib โ โAKS/API โ
โโโโโโโโโโโ โโโโโโโโโโโโ
๐ฏ Use Cases
1. Banks & Credit Unions
- Personal loan scoring
- Credit card approvals
- Mortgage risk assessment
- SME lending
2. Fintechs
- BNPL (Buy Now Pay Later) scoring
- Micro-lending
- Alternative credit scoring
- KYC risk assessment
3. Insurance
- Claims fraud detection
- Underwriting risk
- Policyholder lifetime value
4. Trading Firms
- Strategy risk monitoring
- Position sizing
- Counterparty risk
- Market regime detection
5. E-commerce
- Transaction fraud
- Account takeover detection
- Chargeback prediction
๐ Performance
- Speed: 10x faster than manual process
- Accuracy: Comparable to senior DS work
- Scalability: Handles 100M+ records with distributed mode
- Memory: Streaming support for datasets > RAM
๐ Security & Compliance
- โ Local-first (no external calls by default)
- โ Audit trail for all transformations
- โ PII detection and scrubbing
- โ Explainable AI for regulatory compliance
- โ Reproducible pipelines (version control)
- โ GDPR-compliant data handling
๐ Documentation
๐ค Contributing
Contributions are welcome! See CONTRIBUTING.md
๐ License
MIT License - see LICENSE file
๐ Acknowledgments
Built with inspiration from years of risk modeling in banking and fintech.
๐ง Contact
- Author: Idriss Bado
- Email: idrissbadoolivier@gmail.com
- GitHub: idrissbado
โญ If AutoRiskML helps you, please star the repo!
๐ Built for the future of automated risk intelligence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoriskml-0.1.0.tar.gz.
File metadata
- Download URL: autoriskml-0.1.0.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22a51286dbf7f400c553b8ad5a60c10286aa33f5d3e268069773e63328ed3dba
|
|
| MD5 |
f69e9d4a0b7c1d0c2181c9bde71a6565
|
|
| BLAKE2b-256 |
9380a11edac558e8ca4a64f566cb0f840ab2b1889fd520a67582e931337e63b2
|
File details
Details for the file autoriskml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autoriskml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94883cbcad693693f3ca5719cfd9eebe556650f1d14bec9f7e68c8bec49bd8d8
|
|
| MD5 |
99d4b59014b5c76c7c8b9b3e4e4fd1d7
|
|
| BLAKE2b-256 |
43e25e1f0271332a055eaa989932d9adb72f4d985cdf3cdf80f95a5b874dfc80
|