Skip to main content

Reusable analytics and ML utilities for transport risk detection and streaming dashboards.

Project description

srx-lib-ml

Reusable, production-grade pandas and scikit-learn utilities extracted from multiple SRX analytics apps. The library is domain-agnostic: pass your column names, not ours. Focus areas:

  • fast data loading and feature engineering for trip/order/event data
  • anomaly detection and clustering using robust defaults
  • risk profiling across vehicles/assets, routes, and temporal dimensions
  • optional Streamlit-friendly helpers for dashboards

Quick start

pip install -e ./srx-lib-ml
import pandas as pd
from srx_lib_ml import features, anomaly, geo, risk, routes

# 1) Normalize your columns to a canonical schema
mapping = {
    "Start Time": "start_time",
    "Stop Time": "end_time",
    "Distance (Km)": "distance_km",
    "Avg Speed": "avg_speed_kmh",
    "StartLat": "start_lat",
    "StartLon": "start_lon",
    "StopLat": "stop_lat",
    "StopLon": "stop_lon",
}
df = features.standardize_columns(pd.read_csv("journeys.csv"), mapping)

# 2) Feature engineering + anomaly scoring
df = features.enrich_journey_frame(df)
df = anomaly.detect_isolation_forest(df)

# 3) Geo clustering (DBSCAN in degrees)
df = geo.assign_dbscan_clusters(df, lat_col="start_lat", lon_col="start_lon", label_col="start_cluster")

# 4) Route/vehicle rollups with your chosen ids
vehicle_profile = risk.vehicle_risk_profile(df, vehicle_id_col="VID", vehicle_name_col="VName", distance_col="distance_km")
route_profile = risk.route_risk_analysis(df, start_label_col="Start Location", stop_label_col="Stop Location", vehicle_id_col="VID", distance_col="distance_km")

# Procurement-style routing (haversine) with fully custom columns
orders = routes.parse_latlon_column(pd.read_csv("orders.csv"), source_col="Pickup Location", lat_col="pickup_lat", lon_col="pickup_lon")
orders["pickup_zone"] = routes.dbscan_haversine(orders.dropna(subset=["pickup_lat", "pickup_lon"]), lat_col="pickup_lat", lon_col="pickup_lon")
orders = routes.add_route_pairs(orders, origin_col="pickup_zone", dest_col="dropoff_zone", route_col="route_id")
perf = routes.actor_route_performance(
    orders,
    route_col="route_id",
    actor_col="Partner",
    id_col="External Id",
    success_flag_col="Pickup Actual Time",
    distance_col="Distance (KM)",
)

The modules stay parameterized so they can be reused across transport, procurement, logistics, or other journey/order/event datasets—pass your own column names to avoid recoding.

Module guide

  • features
    • standardize_columns(df, mapping): rename columns into a canonical schema.
    • ensure_columns(df, required): add missing columns as NaN.
    • enrich_journey_frame(df, ...): derive durations, speed deviation, rule-based flags (long stop, slow, zero distance, high deviation).
    • time_category(hour): shared time bucketer.
  • anomaly
    • detect_isolation_forest(df, config=None, use_enhanced_features=False, feature_override=None): add anomaly scores/flags.
  • geo
    • haversine_distance(lat1, lon1, lat2, lon2): km distance.
    • assign_dbscan_clusters(df, lat_col, lon_col, label_col="cluster", config=None).
    • assign_kmeans_zones(df, lat_col, lon_col, label_col="zone", config=None).
  • location_zones
    • apply_location_risk_zones(df, location_sheets, lat_col="latitude_deg", lon_col="longitude_deg", radius_km=0.5, risk_col="risk_score", start_lat_col="start_lat", start_lon_col="start_lon", stop_lat_col="stop_lat", stop_lon_col="stop_lon", zone_score_mapping=None).
  • risk
    • vehicle_risk_profile(df, vehicle_id_col="vehicle_id", vehicle_name_col="vehicle_name", risk_col="risk_score", anomaly_flag_col="is_anomaly", anomaly_score_col="anomaly_score_normalized", distance_col="distance_km", ...).
    • route_risk_analysis(df, start_label_col="start_label", stop_label_col="stop_label", journey_id_col="journey_id", vehicle_id_col="vehicle_id", risk_col="risk_score", anomaly_flag_col="is_anomaly", distance_col=None, min_journeys=3).
  • temporal
    • temporal_breakdown(df, risk_col="risk_score", anomaly_flag_col="is_anomaly", journey_id_col="journey_id"): hourly/daily/time-category aggregates.
  • routes
    • parse_latlon_column(df, source_col, lat_col, lon_col).
    • dbscan_haversine(df, lat_col, lon_col, eps_km=5.0, min_samples=5).
    • add_route_pairs(df, origin_col, dest_col, route_col="route_id").
    • actor_route_performance(df, route_col, actor_col, id_col, success_flag_col, distance_col, ontime_flag_col=None, min_orders=5).
    • route_complexity_breakdown(df, distance_col, origin_zone_col, dest_zone_col, id_col, ontime_flag_col=None).
    • reallocation_recommendations(perf_df, route_col="route_id", actor_col="actor", success_col="success_rate_pct", total_orders_col="total_orders", min_orders=10, min_gap=15.0).
    • describe_clusters(labels): basic cluster stats.
  • viz (install with extra vizpip install -e ./srx-lib-ml[viz])
    • hero_metric(label, value, delta=None, help_text=None, cols=3, col_idx=0).
    • hero_card(label, value, subtext=None, help_text=None, background="#0f172a", text_color="#e2e8f0", cols=3, col_idx=0).
    • card_container(header, help_text=None, background="#0f172a", text_color="#e2e8f0", border_color="#1f2937", padding="16px", radius="12px", body=None): returns a body container for children.
    • badge(label, color="#2563eb", text_color="#ffffff", padding="4px 10px"): returns HTML string.
    • alert_box(message, tone="info"): info/success/warning/danger banner.
    • section_header(title, description=None, divider=True).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srx_lib_ml-0.1.2.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srx_lib_ml-0.1.2-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file srx_lib_ml-0.1.2.tar.gz.

File metadata

  • Download URL: srx_lib_ml-0.1.2.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5aefb63eae89dc2690b3413501cd6d8089e604a7c64762e64763fe28b7f94811
MD5 ec1385284c57c328d02213257b18a7c2
BLAKE2b-256 43ccf88b3e5682828cf38ee9e2fa488090d6739ebad634df96d6f5004c8f1bcf

See more details on using hashes here.

File details

Details for the file srx_lib_ml-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: srx_lib_ml-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4fc483df95c220808e1e7798f44d7114411938bdc929af9dec3d20364a72f9e1
MD5 154a256627d7193c9a984a31602f2d5b
BLAKE2b-256 0b349d48509e9e6f36ce66824f87c89c7773fff59b9603d577c3d40f3bc4c540

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page