Skip to main content

Python Wrappers for Machine Learning

Project description

pwml

pwml stands for Python Wrappers for Machine Learning

Quality Gate Status PyPI version


Requirements

  • Python >= 3.13
  • See pyproject.toml for the full dependency list

Installation

pip install pwml

Modules

classifiers - Hierarchical Classification

HierarchicalClassifierModel trains a tree of sklearn pipelines, one per node in the label hierarchy. Each node's classifier is selected and tuned independently via GridSearchCV. Inference cascades top-down through the tree.

Features:

  • Configurable text embedding via sentence-transformers (default: all-MiniLM-L6-v2 384-dim)
  • One-hot encoding for categorical features
  • Numeric normalisation to [0, 1] with configurable OOD policy (out_of_range='clip'/'warn'/'raise')
  • Platt calibration with per-class threshold optimization
  • Soft routing: descent stops when prediction confidence falls below a configurable threshold
  • Batch inference via predict_dataframe with pre-computed embeddings
  • Per-node inference latency profiling (profile=True)
  • Model versioning metadata embedded in saved artefacts
  • Evaluation and stratified cross-validation

Training

from pwml.classifiers import hierarchical as hc
from pwml.classifiers import features as fe

model = hc.HierarchicalClassifierModel(
    model_name='my_model',
    experiment_name='experiment_1',
    input_features=[
        fe.InputFeature(feature_name='Style',    feature_type='text'),
        fe.InputFeature(feature_name='Gender',   feature_type='text'),
        fe.InputFeature(feature_name='Brand',    feature_type='text'),
        fe.InputFeature(feature_name='Price',    feature_type='numeric'),
        fe.InputFeature(feature_name='Category', feature_type='category'),
    ],
    output_feature_hierarchy=fe.OutputFeature(
        feature_name='Division',
        child_feature=fe.OutputFeature(feature_name='Class')),
    text_model_name='all-MiniLM-L6-v2')  # or 'all-mpnet-base-v2' for higher quality

model.load_from_dataframe(data=df)
model.save_model(filepath='my_model.pwml')

The model trains n+1 classifiers where n is the number of distinct Division values: one classifier for the top-level Division prediction, and one per Division value for the Class prediction within that division.

Single-sample inference

model = hc.HierarchicalClassifierModel.load_model(filepath='my_model.pwml')

result = model.predict(
    input={'Style': 'slim fit jeans', 'Gender': 'men', 'Brand': 'Acme', 'Price': 49.99, 'Category': 'Bottoms'},
    min_routing_confidence=0.6)

# result is a list of dicts, one per hierarchy level:
# [{'feature_name': 'Division', 'value': 'Apparel', 'confidence': 0.91},
#  {'feature_name': 'Class',    'value': 'Denim',   'confidence': 0.78}]

Batch inference

predictions_df = model.predict_dataframe(data=df)
# Returns df with extra columns: Division_predicted, Division_confidence, Class_predicted, Class_confidence

# With per-node latency profiling
predictions_df, latency = model.predict_dataframe(data=df, profile=True)
# latency: {'Division': 0.0012, 'Division/Apparel': 0.0009, ...}

Evaluation and cross-validation

metrics, predictions_df = model.evaluate(data=df)

summary, per_fold = model.cross_validate(data=df, n_splits=5, search_n_jobs=4)
print(summary)  # {'Division': {'mean': 0.87, 'std': 0.02}, 'Class': {'mean': 0.74, 'std': 0.04}}

timeseries - Time Series Utilities

Data augmentation

from pwml.timeseries import dataaugmentationhelpers as dah

# Split data before calling prepare_data to avoid scaler leakage
train_df = df.iloc[:split]
test_df  = df.iloc[split:]

X_train, y_train, index, scaler_in, scaler_out, n_samples = dah.prepare_data(
    data=train_df,
    lags_in=[1, 7],
    cols_in=['feature_a', 'feature_b'],
    steps_in=14,
    cols_out=['target'],
    steps_out=7,
    augmentation_factor=3,
    noise_std=0.05)

# Pass pre-fit scalers for the test set to prevent leakage
X_test, y_test, _, _, _, _ = dah.prepare_data(
    data=test_df,
    lags_in=[1, 7],
    cols_in=['feature_a', 'feature_b'],
    steps_in=14,
    cols_out=['target'],
    steps_out=7,
    scaler_in=scaler_in,
    scaler_out=scaler_out)

Prophet helpers

from pwml.timeseries import prophethelpers as ph

# Summarise regressor coefficients for a fitted Prophet model
coefs_df = ph.regressor_coefficients(m)

# Plot regressor importance (beta coefficients)
ph.plot_regressors_importance(m, title='Regressor importance')

Visualization

from pwml.timeseries import visualizationhelpers as vh

vh.plot_time_series(
    title='Forecast',
    training=train_df,
    testing=test_df,
    prediction=forecast_df,
    confidence=forecast_df)

vh.plot_time_series_dist(data=residuals, title='Residual distribution')

vh.plot_seasonal_decomposition(data=series, period=52)

vh.plot_autocorrelation(data=series, lags=50)

utilities

Module Purpose
graphichelpers GraphicsStatics: matplotlib/seaborn style initialization, color/linestyle palette, style_plot
mssqlhelpers execute(proc_name, conn_params, proc_params, commit=True) - call a stored procedure, returns a DataFrame
neptunehelpers ExperimentTracker protocol + NeptuneExperimentManager - vendor-neutral experiment tracking (Neptune adapter requires pip install pwml[neptune])
driftmonitor DriftMonitor - compute PSI and Jensen-Shannon divergence between reference and live distributions; integrates with any ExperimentTracker
httphelpers Image download utilities
imagehelpers PIL image helpers (resize, crop, batch conversion)
filehelpers Pickle serialization helpers
classificationhelpers MulticlassClassifierOptimizer - Platt calibration + per-class threshold tuning
commonhelpers Miscellaneous utilities

examples - Runnable Examples

Model Hosting (examples/modelhosting.py)

A Flask REST API that serves one or more pre-trained HierarchicalClassifierModel instances.

python examples/modelhosting.py \
    --host 0.0.0.0 \
    --port 5000 \
    --models "v1/division|/path/to/model.pwml"

Each loaded model is exposed at /api/<model-id> (POST). For production, use a WSGI server such as gunicorn:

gunicorn -w 4 -b 0.0.0.0:5000 "modelhosting:Statics.g_app"

Streamlit Web App (examples/webapp/app.py)

An interactive demo app covering data exploration, batch predictions with confidence heatmaps, per-level accuracy charts, per-node latency profiling, and concept drift monitoring with PSI gauges.

pip install streamlit
streamlit run examples/webapp/app.py

Experiment tracking

from pwml.utilities import neptunehelpers as nh

with nh.NeptuneExperimentManager(
        log=True,
        project_name='workspace/project',
        experiment_name='run_001',
        experiment_params={'lr': 0.01, 'epochs': 100},
        experiment_tags=['baseline']) as em:

    em.set_experiment_property('dataset_version', 'v3')
    em.log_data(data=results_df, name='results')
    em.log_chart(figure=fig, name='loss_curve')

Requires neptune >= 1.0 (pip install pwml[neptune]). Set the NEPTUNE_API_TOKEN environment variable before running.


VS Code Tasks

The project includes pre-configured VS Code tasks (.vscode/tasks.json) for common development workflows. Run them via Terminal > Run Task.

Task Description Port
Jupyter: Start Lab Server Starts a token-free JupyterLab server 8888
Jupyter: Start Notebook Server Starts a classic Jupyter Notebook server (via nbclassic) 8889
Streamlit: Start Demo App Launches the interactive pwml demo web application 8501
Test: Run All Notebooks Runs all example notebooks as tests via nbmake -
Test: Run Notebook (prompt) Runs a single notebook by name -

When using the devcontainer, ports 8501, 8888, and 8889 are automatically forwarded to the host.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pwml-2.0.0.tar.gz (63.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pwml-2.0.0-py3-none-any.whl (68.3 kB view details)

Uploaded Python 3

File details

Details for the file pwml-2.0.0.tar.gz.

File metadata

  • Download URL: pwml-2.0.0.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pwml-2.0.0.tar.gz
Algorithm Hash digest
SHA256 cd0be2f4c43d5a19778f57ff7d713eaf50e75e3566c025a7faf06d2005bf5b6e
MD5 b698a487c8742b4b9216e346dbcd749f
BLAKE2b-256 39746848f1da2ab81e2352d217a5490160ef1d519284de08b6b1d5181cba31f2

See more details on using hashes here.

File details

Details for the file pwml-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: pwml-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 68.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pwml-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 331a8d132159df36347c771a11881a4229832b9362a5eb57518b270ca6a9e66f
MD5 21d4ebde56752d348d4e6e4d7d5701ef
BLAKE2b-256 f270c4a6205da5054cc8606c7343373e5658f8fd10ed82371cdf85e18c3ac21c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page