Skip to main content

Reusable analytics and ML utilities for transport risk detection and streaming dashboards.

Project description

srx-lib-ml

Reusable, production-grade pandas and scikit-learn utilities extracted from multiple SRX analytics apps. The library is domain-agnostic: pass your column names, not ours. Focus areas:

  • fast data loading and feature engineering for trip/order/event data
  • anomaly detection and clustering using robust defaults
  • risk profiling across vehicles/assets, routes, and temporal dimensions
  • optional Streamlit-friendly helpers for dashboards

Quick start

pip install -e ./srx-lib-ml
import pandas as pd
from srx_lib_ml import features, anomaly, geo, risk, routes

# 1) Normalize your columns to a canonical schema
mapping = {
    "Start Time": "start_time",
    "Stop Time": "end_time",
    "Distance (Km)": "distance_km",
    "Avg Speed": "avg_speed_kmh",
    "StartLat": "start_lat",
    "StartLon": "start_lon",
    "StopLat": "stop_lat",
    "StopLon": "stop_lon",
}
df = features.standardize_columns(pd.read_csv("journeys.csv"), mapping)

# 2) Feature engineering + anomaly scoring
df = features.enrich_journey_frame(df)
df = anomaly.detect_isolation_forest(df)

# 3) Geo clustering (DBSCAN in degrees)
df = geo.assign_dbscan_clusters(df, lat_col="start_lat", lon_col="start_lon", label_col="start_cluster")

# 4) Route/vehicle rollups with your chosen ids
vehicle_profile = risk.vehicle_risk_profile(df, vehicle_id_col="VID", vehicle_name_col="VName", distance_col="distance_km")
route_profile = risk.route_risk_analysis(df, start_label_col="Start Location", stop_label_col="Stop Location", vehicle_id_col="VID", distance_col="distance_km")

# Procurement-style routing (haversine) with fully custom columns
orders = routes.parse_latlon_column(pd.read_csv("orders.csv"), source_col="Pickup Location", lat_col="pickup_lat", lon_col="pickup_lon")
orders["pickup_zone"] = routes.dbscan_haversine(orders.dropna(subset=["pickup_lat", "pickup_lon"]), lat_col="pickup_lat", lon_col="pickup_lon")
orders = routes.add_route_pairs(orders, origin_col="pickup_zone", dest_col="dropoff_zone", route_col="route_id")
perf = routes.actor_route_performance(
    orders,
    route_col="route_id",
    actor_col="Partner",
    id_col="External Id",
    success_flag_col="Pickup Actual Time",
    distance_col="Distance (KM)",
)

The modules stay parameterized so they can be reused across transport, procurement, logistics, or other journey/order/event datasets—pass your own column names to avoid recoding.

Module guide

  • features
    • standardize_columns(df, mapping): rename columns into a canonical schema.
    • ensure_columns(df, required): add missing columns as NaN.
    • enrich_journey_frame(df, ...): derive durations, speed deviation, rule-based flags (long stop, slow, zero distance, high deviation).
    • time_category(hour): shared time bucketer.
  • anomaly
    • detect_isolation_forest(df, config=None, use_enhanced_features=False, feature_override=None): add anomaly scores/flags.
  • geo
    • haversine_distance(lat1, lon1, lat2, lon2): km distance.
    • assign_dbscan_clusters(df, lat_col, lon_col, label_col="cluster", config=None).
    • assign_kmeans_zones(df, lat_col, lon_col, label_col="zone", config=None).
  • location_zones
    • apply_location_risk_zones(df, location_sheets, lat_col="latitude_deg", lon_col="longitude_deg", radius_km=0.5, risk_col="risk_score", start_lat_col="start_lat", start_lon_col="start_lon", stop_lat_col="stop_lat", stop_lon_col="stop_lon", zone_score_mapping=None).
  • risk
    • vehicle_risk_profile(df, vehicle_id_col="vehicle_id", vehicle_name_col="vehicle_name", risk_col="risk_score", anomaly_flag_col="is_anomaly", anomaly_score_col="anomaly_score_normalized", distance_col="distance_km", ...).
    • route_risk_analysis(df, start_label_col="start_label", stop_label_col="stop_label", journey_id_col="journey_id", vehicle_id_col="vehicle_id", risk_col="risk_score", anomaly_flag_col="is_anomaly", distance_col=None, min_journeys=3).
  • temporal
    • temporal_breakdown(df, risk_col="risk_score", anomaly_flag_col="is_anomaly", journey_id_col="journey_id"): hourly/daily/time-category aggregates.
  • routes
    • parse_latlon_column(df, source_col, lat_col, lon_col).
    • dbscan_haversine(df, lat_col, lon_col, eps_km=5.0, min_samples=5).
    • add_route_pairs(df, origin_col, dest_col, route_col="route_id").
    • actor_route_performance(df, route_col, actor_col, id_col, success_flag_col, distance_col, ontime_flag_col=None, min_orders=5).
    • route_complexity_breakdown(df, distance_col, origin_zone_col, dest_zone_col, id_col, ontime_flag_col=None).
    • reallocation_recommendations(perf_df, route_col="route_id", actor_col="actor", success_col="success_rate_pct", total_orders_col="total_orders", min_orders=10, min_gap=15.0).
    • describe_clusters(labels): basic cluster stats.
  • viz (install with extra vizpip install -e ./srx-lib-ml[viz])
    • hero_metric(label, value, delta=None, help_text=None, cols=3, col_idx=0).
    • hero_card(label, value, subtext=None, help_text=None, background="#0f172a", text_color="#e2e8f0", cols=3, col_idx=0).
    • card_container(header, help_text=None, background="#0f172a", text_color="#e2e8f0", border_color="#1f2937", padding="16px", radius="12px", body=None): returns a body container for children.
    • badge(label, color="#2563eb", text_color="#ffffff", padding="4px 10px"): returns HTML string.
    • alert_box(message, tone="info"): info/success/warning/danger banner.
    • section_header(title, description=None, divider=True).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srx_lib_ml-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srx_lib_ml-0.1.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file srx_lib_ml-0.1.0.tar.gz.

File metadata

  • Download URL: srx_lib_ml-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 231be7ed3d4d7688f4d5d4212779b38a1f8b1dca0885811cba865f69e904606d
MD5 a2ae15ed1ffed2254e4c5e1f414f7e21
BLAKE2b-256 ccad261e7ae42c35341d1ce41524a1017270636e73d58703b7dbc7c6fba68db4

See more details on using hashes here.

File details

Details for the file srx_lib_ml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: srx_lib_ml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for srx_lib_ml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24bba76e65f807975d0ae37691db37a2455759ba0ea2c62422247cc3eabb8693
MD5 e63c7cd05dbfb0ca8d71e6ec7466291d
BLAKE2b-256 9480de8f38eb2540f39aa73faa9a1773ca3d34338182ef877b43f30ce4d2ea69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page