Python Wrappers for Machine Learning
Project description
pwml
pwml stands for Python Wrappers for Machine Learning
Requirements
- Python >= 3.13
- See
pyproject.tomlfor the full dependency list
Installation
pip install pwml
Modules
classifiers - Hierarchical Classification
HierarchicalClassifierModel trains a tree of sklearn pipelines, one per node in the label hierarchy. Each node's classifier is selected and tuned independently via GridSearchCV. Inference cascades top-down through the tree.
Features:
- Configurable text embedding via sentence-transformers (default:
all-MiniLM-L6-v2384-dim) - One-hot encoding for categorical features
- Numeric normalisation to [0, 1] with configurable OOD policy (
out_of_range='clip'/'warn'/'raise') - Platt calibration with per-class threshold optimization
- Soft routing: descent stops when prediction confidence falls below a configurable threshold
- Batch inference via
predict_dataframewith pre-computed embeddings - Per-node inference latency profiling (
profile=True) - Model versioning metadata embedded in saved artefacts
- Evaluation and stratified cross-validation
Training
from pwml.classifiers import hierarchical as hc
from pwml.classifiers import features as fe
model = hc.HierarchicalClassifierModel(
model_name='my_model',
experiment_name='experiment_1',
input_features=[
fe.InputFeature(feature_name='Style', feature_type='text'),
fe.InputFeature(feature_name='Gender', feature_type='text'),
fe.InputFeature(feature_name='Brand', feature_type='text'),
fe.InputFeature(feature_name='Price', feature_type='numeric'),
fe.InputFeature(feature_name='Category', feature_type='category'),
],
output_feature_hierarchy=fe.OutputFeature(
feature_name='Division',
child_feature=fe.OutputFeature(feature_name='Class')),
text_model_name='all-MiniLM-L6-v2') # or 'all-mpnet-base-v2' for higher quality
model.load_from_dataframe(data=df)
model.save_model(filepath='my_model.pwml')
The model trains n+1 classifiers where n is the number of distinct Division values:
one classifier for the top-level Division prediction, and one per Division value for the Class prediction within that division.
Single-sample inference
model = hc.HierarchicalClassifierModel.load_model(filepath='my_model.pwml')
result = model.predict(
input={'Style': 'slim fit jeans', 'Gender': 'men', 'Brand': 'Acme', 'Price': 49.99, 'Category': 'Bottoms'},
min_routing_confidence=0.6)
# result is a list of dicts, one per hierarchy level:
# [{'feature_name': 'Division', 'value': 'Apparel', 'confidence': 0.91},
# {'feature_name': 'Class', 'value': 'Denim', 'confidence': 0.78}]
Batch inference
predictions_df = model.predict_dataframe(data=df)
# Returns df with extra columns: Division_predicted, Division_confidence, Class_predicted, Class_confidence
# With per-node latency profiling
predictions_df, latency = model.predict_dataframe(data=df, profile=True)
# latency: {'Division': 0.0012, 'Division/Apparel': 0.0009, ...}
Evaluation and cross-validation
metrics, predictions_df = model.evaluate(data=df)
summary, per_fold = model.cross_validate(data=df, n_splits=5, search_n_jobs=4)
print(summary) # {'Division': {'mean': 0.87, 'std': 0.02}, 'Class': {'mean': 0.74, 'std': 0.04}}
timeseries - Time Series Utilities
Data augmentation
from pwml.timeseries import dataaugmentationhelpers as dah
# Split data before calling prepare_data to avoid scaler leakage
train_df = df.iloc[:split]
test_df = df.iloc[split:]
X_train, y_train, index, scaler_in, scaler_out, n_samples = dah.prepare_data(
data=train_df,
lags_in=[1, 7],
cols_in=['feature_a', 'feature_b'],
steps_in=14,
cols_out=['target'],
steps_out=7,
augmentation_factor=3,
noise_std=0.05)
# Pass pre-fit scalers for the test set to prevent leakage
X_test, y_test, _, _, _, _ = dah.prepare_data(
data=test_df,
lags_in=[1, 7],
cols_in=['feature_a', 'feature_b'],
steps_in=14,
cols_out=['target'],
steps_out=7,
scaler_in=scaler_in,
scaler_out=scaler_out)
Prophet helpers
from pwml.timeseries import prophethelpers as ph
# Summarise regressor coefficients for a fitted Prophet model
coefs_df = ph.regressor_coefficients(m)
# Plot regressor importance (beta coefficients)
ph.plot_regressors_importance(m, title='Regressor importance')
Visualization
from pwml.timeseries import visualizationhelpers as vh
vh.plot_time_series(
title='Forecast',
training=train_df,
testing=test_df,
prediction=forecast_df,
confidence=forecast_df)
vh.plot_time_series_dist(data=residuals, title='Residual distribution')
vh.plot_seasonal_decomposition(data=series, period=52)
vh.plot_autocorrelation(data=series, lags=50)
utilities
| Module | Purpose |
|---|---|
graphichelpers |
GraphicsStatics: matplotlib/seaborn style initialization, color/linestyle palette, style_plot |
mssqlhelpers |
execute(proc_name, conn_params, proc_params, commit=True) - call a stored procedure, returns a DataFrame |
neptunehelpers |
ExperimentTracker protocol + NeptuneExperimentManager - vendor-neutral experiment tracking (Neptune adapter requires pip install pwml[neptune]) |
driftmonitor |
DriftMonitor - compute PSI and Jensen-Shannon divergence between reference and live distributions; integrates with any ExperimentTracker |
httphelpers |
Image download utilities |
imagehelpers |
PIL image helpers (resize, crop, batch conversion) |
filehelpers |
Pickle serialization helpers |
classificationhelpers |
MulticlassClassifierOptimizer - Platt calibration + per-class threshold tuning |
commonhelpers |
Miscellaneous utilities |
examples - Runnable Examples
Model Hosting (examples/modelhosting.py)
A Flask REST API that serves one or more pre-trained HierarchicalClassifierModel instances.
python examples/modelhosting.py \
--host 0.0.0.0 \
--port 5000 \
--models "v1/division|/path/to/model.pwml"
Each loaded model is exposed at /api/<model-id> (POST). For production, use a WSGI server such as gunicorn:
gunicorn -w 4 -b 0.0.0.0:5000 "modelhosting:Statics.g_app"
Streamlit Web App (examples/webapp/app.py)
An interactive demo app covering data exploration, batch predictions with confidence heatmaps, per-level accuracy charts, per-node latency profiling, and concept drift monitoring with PSI gauges.
pip install streamlit
streamlit run examples/webapp/app.py
Experiment tracking
from pwml.utilities import neptunehelpers as nh
with nh.NeptuneExperimentManager(
log=True,
project_name='workspace/project',
experiment_name='run_001',
experiment_params={'lr': 0.01, 'epochs': 100},
experiment_tags=['baseline']) as em:
em.set_experiment_property('dataset_version', 'v3')
em.log_data(data=results_df, name='results')
em.log_chart(figure=fig, name='loss_curve')
Requires neptune >= 1.0 (pip install pwml[neptune]). Set the NEPTUNE_API_TOKEN environment variable before running.
VS Code Tasks
The project includes pre-configured VS Code tasks (.vscode/tasks.json) for common development workflows. Run them via Terminal > Run Task.
| Task | Description | Port |
|---|---|---|
| Jupyter: Start Lab Server | Starts a token-free JupyterLab server | 8888 |
| Jupyter: Start Notebook Server | Starts a classic Jupyter Notebook server (via nbclassic) | 8889 |
| Streamlit: Start Demo App | Launches the interactive pwml demo web application | 8501 |
| Test: Run All Notebooks | Runs all example notebooks as tests via nbmake | - |
| Test: Run Notebook (prompt) | Runs a single notebook by name | - |
When using the devcontainer, ports 8501, 8888, and 8889 are automatically forwarded to the host.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pwml-2.0.0.tar.gz.
File metadata
- Download URL: pwml-2.0.0.tar.gz
- Upload date:
- Size: 63.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd0be2f4c43d5a19778f57ff7d713eaf50e75e3566c025a7faf06d2005bf5b6e
|
|
| MD5 |
b698a487c8742b4b9216e346dbcd749f
|
|
| BLAKE2b-256 |
39746848f1da2ab81e2352d217a5490160ef1d519284de08b6b1d5181cba31f2
|
File details
Details for the file pwml-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pwml-2.0.0-py3-none-any.whl
- Upload date:
- Size: 68.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
331a8d132159df36347c771a11881a4229832b9362a5eb57518b270ca6a9e66f
|
|
| MD5 |
21d4ebde56752d348d4e6e4d7d5701ef
|
|
| BLAKE2b-256 |
f270c4a6205da5054cc8606c7343373e5658f8fd10ed82371cdf85e18c3ac21c
|