Skip to main content

Python Wrappers for Machine Learning

Project description

Pwml

Pwml stands for Python Wrappers for Machine Learning

Quality Gate Status PyPI version


Requirements

  • Python >= 3.8
  • See setup.py for the full dependency list

Installation

pip install Pwml

Modules

classifiers - Hierarchical Classification

HierarchicalClassifierModel trains a tree of sklearn pipelines, one per node in the label hierarchy. Each node's classifier is selected and tuned independently via GridSearchCV. Inference cascades top-down through the tree.

Features:

  • Text embedding via sentence-transformers (all-MiniLM-L6-v2 384-dim, all-mpnet-base-v2 768-dim)
  • One-hot encoding for categorical features
  • Numeric passthrough
  • Platt calibration with per-class threshold optimization
  • Soft routing: descent stops when prediction confidence falls below a configurable threshold
  • Batch inference via predict_dataframe
  • Evaluation and stratified cross-validation

Training

from pwml.classifiers import hierarchical as hc
from pwml.classifiers import features as fe

model = hc.HierarchicalClassifierModel(
    model_name='my_model',
    experiment_name='experiment_1',
    input_features=[
        fe.InputFeature(feature_name='Style',    feature_type='text'),
        fe.InputFeature(feature_name='Gender',   feature_type='text'),
        fe.InputFeature(feature_name='Brand',    feature_type='text'),
        fe.InputFeature(feature_name='Price',    feature_type='numeric'),
        fe.InputFeature(feature_name='Category', feature_type='category'),
    ],
    output_feature_hierarchy=fe.OutputFeature(
        feature_name='Division',
        child_feature=fe.OutputFeature(feature_name='Class')))

model.load_from_dataframe(data=df)
model.save_model(filepath='my_model.joblib')

The model trains n+1 classifiers where n is the number of distinct Division values: one classifier for the top-level Division prediction, and one per Division value for the Class prediction within that division.

Single-sample inference

model = hc.HierarchicalClassifierModel.load_model(filepath='my_model.joblib')

result = model.predict(
    input={'Style': 'slim fit jeans', 'Gender': 'men', 'Brand': 'Acme', 'Price': 49.99, 'Category': 'Bottoms'},
    min_routing_confidence=0.6)

# result is a list of dicts, one per hierarchy level:
# [{'feature_name': 'Division', 'value': 'Apparel', 'confidence': 0.91},
#  {'feature_name': 'Class',    'value': 'Denim',   'confidence': 0.78}]

Batch inference

predictions_df = model.predict_dataframe(data=df)
# Returns df with extra columns: Division_predicted, Division_confidence, Class_predicted, Class_confidence

Evaluation and cross-validation

metrics, predictions_df = model.evaluate(data=df)

summary, per_fold = model.cross_validate(data=df, n_splits=5)
print(summary)  # {'Division': {'mean': 0.87, 'std': 0.02}, 'Class': {'mean': 0.74, 'std': 0.04}}

timeseries - Time Series Utilities

Data augmentation

from pwml.timeseries import dataaugmentationhelpers as dah

# Split data before calling prepare_data to avoid scaler leakage
train_df = df.iloc[:split]
test_df  = df.iloc[split:]

X_train, y_train, index, scaler_in, scaler_out, n_samples = dah.prepare_data(
    data=train_df,
    lags_in=[1, 7],
    cols_in=['feature_a', 'feature_b'],
    steps_in=14,
    cols_out=['target'],
    steps_out=7,
    augmentation_factor=3,
    noise_std=0.05)

# Pass pre-fit scalers for the test set to prevent leakage
X_test, y_test, _, _, _, _ = dah.prepare_data(
    data=test_df,
    lags_in=[1, 7],
    cols_in=['feature_a', 'feature_b'],
    steps_in=14,
    cols_out=['target'],
    steps_out=7,
    scaler_in=scaler_in,
    scaler_out=scaler_out)

Prophet helpers

from pwml.timeseries import prophethelpers as ph

# Summarise regressor coefficients for a fitted Prophet model
coefs_df = ph.regressor_coefficients(m)

# Plot regressor importance (beta coefficients)
ph.plot_regressors_importance(m, title='Regressor importance')

Visualization

from pwml.timeseries import visualizationhelpers as vh

vh.plot_time_series(
    title='Forecast',
    training=train_df,
    testing=test_df,
    prediction=forecast_df,
    confidence=forecast_df)

vh.plot_time_series_dist(data=residuals, title='Residual distribution')

vh.plot_seasonal_decomposition(data=series, period=52)

vh.plot_autocorrelation(data=series, lags=50)

utilities

Module Purpose
graphichelpers GraphicsStatics: matplotlib/seaborn style initialization, color/linestyle palette, style_plot
mssqlhelpers execute(proc_name, conn_params, proc_params, commit=True) - call a stored procedure, returns a DataFrame
neptunehelpers ExperimentManager - Neptune experiment tracking wrapper (neptune-client >= 1.0)
httphelpers Image download utilities
imagehelpers PIL image helpers (resize, crop, batch conversion)
filehelpers Pickle serialization helpers
classificationhelpers MulticlassClassifierOptimizer - Platt calibration + per-class threshold tuning
commonhelpers Miscellaneous utilities

samples - Model Hosting

samples/modelhosting.py is a Flask REST API that serves one or more pre-trained HierarchicalClassifierModel instances.

python samples/modelhosting.py \
    --host 0.0.0.0 \
    --port 5000 \
    --models "v1/division|/path/to/model.joblib"

Each loaded model is exposed at /api/<model-id> (POST). Debug mode is controlled by the FLASK_DEBUG environment variable (default: off). For production use a WSGI server such as gunicorn:

gunicorn -w 4 -b 0.0.0.0:5000 "modelhosting:Statics.g_app"

Experiment tracking (Neptune)

from pwml.utilities import neptunehelpers as nh

with nh.ExperimentManager(
        log=True,
        project_name='workspace/project',
        experiment_name='run_001',
        experiment_params={'lr': 0.01, 'epochs': 100},
        experiment_tags=['baseline']) as em:

    em.set_experiment_property('dataset_version', 'v3')
    em.log_data_to_neptune(data=results_df, name='results')
    em.log_chart_to_neptune(figure=fig, name='loss_curve')

Requires neptune-client >= 1.0. Set the NEPTUNE_API_TOKEN environment variable before running.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pwml-1.2.0.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pwml-1.2.0-py3-none-any.whl (69.4 kB view details)

Uploaded Python 3

File details

Details for the file pwml-1.2.0.tar.gz.

File metadata

  • Download URL: pwml-1.2.0.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pwml-1.2.0.tar.gz
Algorithm Hash digest
SHA256 88a1c95e69776fa9823e17204bc136b0ffc75e5c5456c803303c4bfa55670094
MD5 be4b10ece7fa66e9c7fd04eae1453a5e
BLAKE2b-256 63ee6f2d82dc01db5da3ed0f840bc93c079fb655a5f4bb433976b786e32b9c93

See more details on using hashes here.

File details

Details for the file pwml-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pwml-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 69.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pwml-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98e568076958667ee048bf31ef132b7b38ded4507cb0992e0d9c70f175a112c9
MD5 7ed49b77b8f5ea13d6c99decc88d10bb
BLAKE2b-256 30242099eb8722cd0a1ba10e00f2113a73e076eb7d4313a4c3ca06e805ed53b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page