Production-ready automated ML scoring engine.
Project description
ScoreFlow
ScoreFlow is a production-ready automated machine learning scoring engine. It takes raw tabular data, trains and evaluates multiple classifiers, selects the best model by AUC and stability, and exposes it as a REST API or exportable batch pipeline — all from a single Python package.
Table of Contents
- Features
- Architecture
- Package Layout
- Installation
- Quickstart
- Modules In Depth
- CLI Reference
- REST API Reference
- Configuration
- Testing
- Project Roadmap
Features
| Capability | Details |
|---|---|
| Multi-model training | Logistic Regression, Random Forest, LightGBM (pluggable interface) |
| Rigorous evaluation | AUC-ROC, KS statistic, lift at deciles, PSI stability |
| Automatic selection | Best model chosen by AUC then KS as a tie-breaker |
| REST API | FastAPI app with /health and /score endpoints |
| Batch pipeline | Export any trained model to a self-contained folder + run it on CSV/Parquet |
| Structured logging | JSON-line logs with request IDs, model version, and timing |
| Reproducibility | Fixed random seeds, versioned artifacts with timestamps |
| Full test suite | 50 tests — unit, API, pipeline, and end-to-end integration |
Architecture
Raw CSV/Parquet
│
▼
┌─────────────┐ schema, splits
│ Data layer │ ──────────────────────────────────────────────┐
└─────────────┘ │
│ │
▼ ▼
┌──────────────┐ ModelArtifact ┌────────────────┐ ┌───────────────┐
│ Model Trainer│ ────────────────▶│ Evaluator │──▶│ Model Selector│
└──────────────┘ └────────────────┘ └───────┬───────┘
│ best artifact
┌────────────────┐ │
│ Scoring API │◀──────────┤
│ (FastAPI) │ │
└────────────────┘ │
▼
┌──────────────────┐
│ Pipeline Exporter │
│ (offline runner) │
└──────────────────┘
Package Layout
ScoreFlow/
├── scoreflow/
│ ├── config.py # Paths, seeds, global defaults
│ ├── logging_config.py # Structured JSON logging setup
│ ├── __main__.py # CLI: `python -m scoreflow`
│ ├── data/
│ │ ├── schema.py # DatasetSchema — feature/target definitions
│ │ ├── loaders.py # CSV/Parquet loaders with validation
│ │ └── splitters.py # Train / val / test split logic
│ ├── models/
│ │ ├── base.py # Abstract BaseModel + ModelArtifact dataclass
│ │ ├── logistic.py # Logistic Regression wrapper
│ │ ├── random_forest.py # Random Forest wrapper
│ │ ├── lightgbm_model.py # LightGBM wrapper
│ │ ├── trainer.py # Trainer orchestrator
│ │ └── registry.py # Hyperparameter registry
│ ├── evaluation/
│ │ ├── metrics.py # AUC, KS, lift, PSI
│ │ ├── evaluator.py # Evaluator — runs all metrics, returns report
│ │ └── selection.py # select_best_model(), persist_evaluation_report()
│ ├── api/
│ │ └── app.py # FastAPI app factory (create_app)
│ └── pipeline/
│ ├── exporter.py # export_pipeline() — serialize artifact to folder
│ └── runner.py # run_pipeline() — load export, score a file
├── tests/
│ ├── test_config.py
│ ├── test_data.py
│ ├── test_models.py
│ ├── test_evaluation.py
│ ├── test_api.py
│ ├── test_pipeline.py
│ └── test_integration.py # Full end-to-end flow
├── pyproject.toml
├── Makefile
└── ROADMAP.md
Installation
Requirements: Python ≥ 3.11
# 1. Clone the repository
git clone https://github.com/your-org/scoreflow.git
cd scoreflow
# 2. Install with development dependencies
pip install -e ".[dev]"
Core dependencies installed automatically: pandas, scikit-learn, numpy, lightgbm, xgboost, pyarrow, scipy, fastapi, uvicorn, joblib.
Quickstart
1 — Train models on your data
import pandas as pd
from scoreflow.data.schema import DatasetSchema
from scoreflow.data.splitters import random_split
from scoreflow.models.trainer import Trainer
df = pd.read_csv("data/my_dataset.csv")
schema = DatasetSchema(target="default") # name of your binary target column
schema.resolve(df) # auto-detect feature columns
splits = random_split(df, target="default", test_size=0.2, val_size=0.1)
trainer = Trainer(schema) # trains LR, RF, LightGBM by default
artifacts = trainer.train(splits)
trainer.save_artifacts(artifacts, "models/")
2 — Evaluate and select the best model
from scoreflow.evaluation import Evaluator, select_best_model, persist_evaluation_report
evaluator = Evaluator(schema)
reports = [evaluator.evaluate(a, splits) for a in artifacts]
best = select_best_model(reports, split="val")
print(f"Best model: {best.model_name} AUC={best.metrics['val']['auc']:.4f}")
persist_evaluation_report(best, "reports/")
3 — Start the scoring API
python -m scoreflow serve --model-dir models/
# Health check
curl http://localhost:8000/health
# Score a record
curl -X POST http://localhost:8000/score \
-H "Content-Type: application/json" \
-d '{"records": [{"feature_a": 1.2, "feature_b": 0.5}]}'
4 — Export and run as a batch pipeline
# Export the best model to a portable folder
python -c "
from scoreflow.pipeline.exporter import export_pipeline
from scoreflow.models.base import ModelArtifact
a = ModelArtifact.load('models/logistic_regression')
from scoreflow.data.schema import DatasetSchema
import json, pathlib
schema_data = json.loads((pathlib.Path('models/logistic_regression/schema.json')).read_text())
# (or pass schema from training)
"
# Or use the CLI runner on an exported pipeline
python -m scoreflow run \
--pipeline exports/logistic_regression \
--input new_data.csv \
--output scores.csv
Modules In Depth
Data Ingestion
scoreflow/data/schema.py — DatasetSchema
Holds the target column name, optional ID and date columns, and the resolved list of numeric feature columns. Call schema.resolve(df) once to auto-detect features.
schema = DatasetSchema(
target="default",
id_col="customer_id", # optional, excluded from features
date_col="application_date" # optional, excluded from features
)
schema.resolve(df)
print(schema.features) # ['age', 'income', 'debt_ratio', ...]
scoreflow/data/loaders.py
from scoreflow.data.loaders import load_dataset
df = load_dataset("data/train.csv") # CSV
df = load_dataset("data/train.parquet") # Parquet
Validates: missing rates, data types, required columns.
scoreflow/data/splitters.py
from scoreflow.data.splitters import random_split, time_split
splits = random_split(df, target="default", test_size=0.2, val_size=0.1, seed=42)
# splits.train / splits.val / splits.test
Model Training
All models implement a common BaseModel interface: fit(X, y), predict_proba(X), get_params().
| Model | Class | Notes |
|---|---|---|
| Logistic Regression | LogisticModel |
L2 regularised, fast baseline |
| Random Forest | RandomForestModel |
200 trees, bagging ensemble |
| LightGBM | LightGBMModel |
Gradient boosting, high accuracy |
Trainer orchestrator
from scoreflow.models.trainer import Trainer
# Train specific models only
trainer = Trainer(schema, model_names=["logistic_regression", "lightgbm"])
artifacts = trainer.train(splits) # returns List[ModelArtifact]
trainer.save_artifacts(artifacts, "models/")
Each artifact is saved as a sub-directory:
models/
├── logistic_regression/
│ ├── model.joblib # serialised model
│ └── metadata.json # name, timestamp, params
├── random_forest/
│ └── ...
Evaluation & Selection
Metrics computed (on val and test splits):
| Metric | Description |
|---|---|
auc |
Area Under the ROC Curve |
ks |
Kolmogorov–Smirnov statistic |
lift_10 |
Lift at top 10% of scores |
lift_20 |
Lift at top 20% of scores |
psi |
Population Stability Index (train vs val) |
from scoreflow.evaluation import Evaluator, select_best_model
evaluator = Evaluator(schema)
reports = [evaluator.evaluate(artifact, splits) for artifact in artifacts]
# Primary sort: AUC ↓, tie-break: KS ↓
best = select_best_model(reports, split="val")
Each report is saved as reports/<model_name>_<timestamp>.json.
Scoring API
Built with FastAPI. The app factory pattern (create_app()) isolates state between instances, making parallel test execution safe.
Startup behaviour: At startup, the server scans the --model-dir directory and auto-loads the artifact with the newest timestamp. Override with the SCOREFLOW_MODEL_NAME env var.
Endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/health |
Returns model name + version, or 503 if no model loaded |
POST |
/score |
Scores one or many records, returns probability scores 0–1 |
GET |
/docs |
Auto-generated interactive Swagger UI |
Example response from /score:
{
"scores": [0.823, 0.041, 0.671],
"model_name": "lightgbm",
"model_version": "2024-11-01T10:32:11"
}
Error codes: 422 for invalid/empty payload, 503 if no model is loaded.
Exportable Pipeline
Export a trained model to a portable self-contained folder, then run it anywhere — no full ScoreFlow install needed for scoring.
from scoreflow.pipeline.exporter import export_pipeline
pipeline_dir = export_pipeline(
artifact,
schema,
output_dir="exports/"
)
# exports/logistic_regression/
# ├── model.joblib
# └── schema.json
Run the exported pipeline on new data:
from scoreflow.pipeline.runner import run_pipeline
scored_df = run_pipeline(
pipeline_dir="exports/logistic_regression",
input_path="new_customers.csv",
output_path="scores.csv"
)
# scored_df has all original columns + a "score" column (0–1 probability)
Logging
ScoreFlow uses Python's standard logging module with an optional JSON-line formatter for production.
from scoreflow.logging_config import configure_logging
configure_logging(json_format=True) # structured JSON to stdout
configure_logging(json_format=False) # human-readable (default)
JSON log line example:
{"timestamp": "2024-11-01T10:32:11Z", "level": "INFO", "logger": "scoreflow.api.app", "message": "Auto-selected model artifact: lightgbm"}
CLI Reference
# Start the REST API server
python -m scoreflow serve \
--model-dir models/ # directory of model artifacts (default: models/)
--model-name lightgbm # load a specific model (optional)
--host 0.0.0.0 # bind host (default: 0.0.0.0)
--port 8000 # bind port (default: 8000)
# Score a CSV/Parquet file using an exported pipeline
python -m scoreflow run \
--pipeline exports/lightgbm # exported pipeline directory
--input new_data.csv # input file (CSV or Parquet)
--output scores.csv # output file path
The scoreflow command is also available if installed via pip install -e ..
REST API Reference
GET /health
Returns the status of the service and the currently loaded model.
Response 200:
{
"status": "ok",
"model": "lightgbm",
"version": "2024-11-01T10:32:11"
}
Response 503 — no model loaded.
POST /score
Score one or more records. Each record is a flat JSON object of feature_name → value pairs matching the training schema.
Request body:
{
"records": [
{"age": 35, "income": 52000, "debt_ratio": 0.42},
{"age": 28, "income": 31000, "debt_ratio": 0.65}
]
}
Response 200:
{
"scores": [0.134, 0.812],
"model_name": "lightgbm",
"model_version": "2024-11-01T10:32:11"
}
Response 422 — empty records list or invalid feature values.
Response 503 — no model loaded.
Configuration
ScoreFlow is configured via environment variables (all optional):
| Variable | Default | Description |
|---|---|---|
SCOREFLOW_DATA_DIR |
./data |
Root directory for raw data files |
SCOREFLOW_MODELS_DIR |
./models |
Root directory for saved model artifacts |
SCOREFLOW_REPORTS_DIR |
./reports |
Root directory for evaluation reports |
SCOREFLOW_MODEL_DIR |
./models |
Model directory used by the API at startup |
SCOREFLOW_MODEL_NAME |
(auto) | Force-load a specific model sub-directory |
Global training defaults live in scoreflow/config.py:
| Setting | Value | Description |
|---|---|---|
random_state |
42 | Fixed seed for reproducibility |
test_size |
0.2 | Fraction of data held out for test |
cv_folds |
5 | Cross-validation folds |
primary_metric |
roc_auc |
Metric used for model selection |
Testing
# Run the full test suite (50 tests)
python -m pytest tests/ -v
# Run with code coverage
python -m pytest tests/ --cov=scoreflow --cov-report=html
open htmlcov/index.html
# Lint
python -m ruff check scoreflow tests
# Type check
python -m mypy scoreflow
Test coverage by module:
| Test file | Tests | Covers |
|---|---|---|
test_config.py |
2 | Config defaults and path resolution |
test_data.py |
10 | Loaders, schema, splitters, validation |
test_models.py |
6 | All classifiers, trainer, save/load |
test_evaluation.py |
6 | AUC, KS, lift, PSI, evaluator, selection |
test_api.py |
6 | FastAPI health + score endpoints |
test_pipeline.py |
4 | Export structure + batch runner |
test_integration.py |
1 | Full end-to-end: data → train → eval → select → export → run |
Project Roadmap
See ROADMAP.md for the full phased build plan.
Completed phases:
- ✅ Phase 1 — Foundation & project setup
- ✅ Phase 2 — Data ingestion (loaders, schema, splits, validation)
- ✅ Phase 3 — Model training & selection (LR, RF, LightGBM + trainer + registry)
- ✅ Phase 4 — Evaluation (AUC, KS, lift, PSI, model selection, persisted reports)
- ✅ Phase 5 — Scoring API (FastAPI,
/health,/score, model auto-loading) - ✅ Phase 6 — Exportable pipeline (exporter + batch runner + CLI)
- ✅ Phase 7 — Production hardening (structured JSON logging, integration tests)
Upcoming:
- 🔲 Dockerfile for containerised deployment
- 🔲 Prometheus metrics (request count, latency, error rate)
- 🔲 GitHub Actions CI (lint + test on every push)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scoreflow-0.1.0.tar.gz.
File metadata
- Download URL: scoreflow-0.1.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f24a5bf5d1a8d5648f4c8c95c01b910e0aaad2f41f6d87a8f58a02304a6eb28
|
|
| MD5 |
7181e22d3d875408ade2c90c85f6e484
|
|
| BLAKE2b-256 |
d13f183f1dc05e1bb73b7629b6b6ee220eced84b9112153e1326ed205400308d
|
File details
Details for the file scoreflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scoreflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
303c1921e094379127ea71210cbf341828484314c4bc8cb997bbefc9805ed643
|
|
| MD5 |
25f7d300d0c1cc04a21e719f001bda24
|
|
| BLAKE2b-256 |
9c05a25c15297d08e167326083a553b24ddb0105dced05adf78cb4215b4ee36c
|