Skip to main content

Reusable analytics and ML utilities for transport risk detection and streaming dashboards.

Project description

srx-lib-ml

Reusable, production-grade pandas and scikit-learn utilities extracted from multiple SRX analytics apps. The library is domain-agnostic: pass your column names, not ours. Focus areas:

  • fast data loading and feature engineering for trip/order/event data
  • anomaly detection and clustering using robust defaults
  • risk profiling across vehicles/assets, routes, and temporal dimensions
  • optional Streamlit-friendly helpers for dashboards

Quick start

pip install -e ./srx-lib-ml
import pandas as pd
from srx_lib_ml import features, anomaly, geo, risk, routes

# 1) Normalize your columns to a canonical schema
mapping = {
    "Start Time": "start_time",
    "Stop Time": "end_time",
    "Distance (Km)": "distance_km",
    "Avg Speed": "avg_speed_kmh",
    "StartLat": "start_lat",
    "StartLon": "start_lon",
    "StopLat": "stop_lat",
    "StopLon": "stop_lon",
}
df = features.standardize_columns(pd.read_csv("journeys.csv"), mapping)

# 2) Feature engineering + anomaly scoring
df = features.enrich_journey_frame(df)
df = anomaly.detect_isolation_forest(df)

# 3) Geo clustering (DBSCAN in degrees)
df = geo.assign_dbscan_clusters(df, lat_col="start_lat", lon_col="start_lon", label_col="start_cluster")

# 4) Route/vehicle rollups with your chosen ids
vehicle_profile = risk.vehicle_risk_profile(df, vehicle_id_col="VID", vehicle_name_col="VName", distance_col="distance_km")
route_profile = risk.route_risk_analysis(df, start_label_col="Start Location", stop_label_col="Stop Location", vehicle_id_col="VID", distance_col="distance_km")

# Procurement-style routing (haversine) with fully custom columns
orders = routes.parse_latlon_column(pd.read_csv("orders.csv"), source_col="Pickup Location", lat_col="pickup_lat", lon_col="pickup_lon")
orders["pickup_zone"] = routes.dbscan_haversine(orders.dropna(subset=["pickup_lat", "pickup_lon"]), lat_col="pickup_lat", lon_col="pickup_lon")
orders = routes.add_route_pairs(orders, origin_col="pickup_zone", dest_col="dropoff_zone", route_col="route_id")
perf = routes.actor_route_performance(
    orders,
    route_col="route_id",
    actor_col="Partner",
    id_col="External Id",
    success_flag_col="Pickup Actual Time",
    distance_col="Distance (KM)",
)

The modules stay parameterized so they can be reused across transport, procurement, logistics, or other journey/order/event datasets—pass your own column names to avoid recoding.

Module guide

  • features
    • standardize_columns(df, mapping): rename columns into a canonical schema.
    • ensure_columns(df, required): add missing columns as NaN.
    • enrich_journey_frame(df, ...): derive durations, speed deviation, rule-based flags (long stop, slow, zero distance, high deviation).
    • time_category(hour): shared time bucketer.
  • anomaly
    • detect_isolation_forest(df, config=None, use_enhanced_features=False, feature_override=None): add anomaly scores/flags.
  • geo
    • haversine_distance(lat1, lon1, lat2, lon2): km distance.
    • assign_dbscan_clusters(df, lat_col, lon_col, label_col="cluster", config=None).
    • assign_kmeans_zones(df, lat_col, lon_col, label_col="zone", config=None).
  • location_zones
    • apply_location_risk_zones(df, location_sheets, lat_col="latitude_deg", lon_col="longitude_deg", radius_km=0.5, risk_col="risk_score", start_lat_col="start_lat", start_lon_col="start_lon", stop_lat_col="stop_lat", stop_lon_col="stop_lon", zone_score_mapping=None).
  • risk
    • vehicle_risk_profile(df, vehicle_id_col="vehicle_id", vehicle_name_col="vehicle_name", risk_col="risk_score", anomaly_flag_col="is_anomaly", anomaly_score_col="anomaly_score_normalized", distance_col="distance_km", ...).
    • route_risk_analysis(df, start_label_col="start_label", stop_label_col="stop_label", journey_id_col="journey_id", vehicle_id_col="vehicle_id", risk_col="risk_score", anomaly_flag_col="is_anomaly", distance_col=None, min_journeys=3).
  • temporal
    • temporal_breakdown(df, risk_col="risk_score", anomaly_flag_col="is_anomaly", journey_id_col="journey_id"): hourly/daily/time-category aggregates.
  • routes
    • parse_latlon_column(df, source_col, lat_col, lon_col).
    • dbscan_haversine(df, lat_col, lon_col, eps_km=5.0, min_samples=5).
    • add_route_pairs(df, origin_col, dest_col, route_col="route_id").
    • actor_route_performance(df, route_col, actor_col, id_col, success_flag_col, distance_col, ontime_flag_col=None, min_orders=5).
    • route_complexity_breakdown(df, distance_col, origin_zone_col, dest_zone_col, id_col, ontime_flag_col=None).
    • reallocation_recommendations(perf_df, route_col="route_id", actor_col="actor", success_col="success_rate_pct", total_orders_col="total_orders", min_orders=10, min_gap=15.0).
    • describe_clusters(labels): basic cluster stats.
  • viz (install with extra vizpip install -e ./srx-lib-ml[viz])
    • hero_metric(label, value, delta=None, help_text=None, cols=3, col_idx=0).
    • hero_card(label, value, subtext=None, help_text=None, background="#0f172a", text_color="#e2e8f0", cols=3, col_idx=0).
    • card_container(header, help_text=None, background="#0f172a", text_color="#e2e8f0", border_color="#1f2937", padding="16px", radius="12px", body=None): returns a body container for children.
    • badge(label, color="#2563eb", text_color="#ffffff", padding="4px 10px"): returns HTML string.
    • alert_box(message, tone="info"): info/success/warning/danger banner.
    • section_header(title, description=None, divider=True).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srx_lib_ml-0.1.1.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srx_lib_ml-0.1.1-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file srx_lib_ml-0.1.1.tar.gz.

File metadata

  • Download URL: srx_lib_ml-0.1.1.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7268a0d7411d003a6c3b322ecfd72a62fdffa43c3ab2d1470fb63816df01972e
MD5 f5fd51878d576fb1d14aaced06b5407c
BLAKE2b-256 21d9610a1db53c797d8fbcced259f00ef49a3e3cfc2fd3f3ea51a8b767457508

See more details on using hashes here.

File details

Details for the file srx_lib_ml-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: srx_lib_ml-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 25f36444dcf04a543fef790f04ae707230469f21fee74b9d596fcedb71dd5af6
MD5 1111ec0bf495536be0a378c863ae961b
BLAKE2b-256 f0da96c4e44406d222388a43867a43b11d65f84b817765cf6d55b144a9e6d7e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page