Skip to main content

High-performance trajectory splitting and analysis, powered by Rust

Project description

trucktrack

High-performance trajectory splitting, generation, and partitioning, powered by Rust.

A Python package implementing logic similar to movingpandas trajectory splitters (ObservationGapSplitter, StopSplitter) with a Rust backend for speed. Data flows through Polars DataFrames, with the option to process entirely in Rust (parquet in, parquet out) or share DataFrames between Python and Rust zero-copy via pyo3-polars.

In addition to the Rust splitters, trucktrack ships two pure-Python subpackages:

  • trucktrack.generate — synthesize realistic truck GPS traces by routing through a Valhalla instance, interpolating along the route, layering parking maneuvers, adding GPS noise, and optionally injecting configurable data-quality errors (signal dropout, multipath, traffic jams, etc.).
  • trucktrack.partition — rewrite a flat parquet of trips as a Valhalla-tile-aligned, hive-partitioned dataset (tier=…/partition_id=…/) with rows sorted by a Hilbert-curve index for spatial locality.

Install

pip install trucktrack

From source

# Requires Python 3.11+ and Rust stable
git clone https://github.com/twedl/trucktrack.git
cd trucktrack
python3 -m venv .venv && source .venv/bin/activate
pip install "maturin>=1.7,<2.0" polars pytest
maturin develop

Usage

Python API — splitters & I/O

from datetime import timedelta
import polars as pl
import trucktrack

df = pl.read_parquet("tracks.parquet")
# Expected columns: id, time, speed, heading, lat, lon

# Split at observation gaps > 2 minutes (column names are configurable)
result = trucktrack.split_by_observation_gap(
    df, timedelta(minutes=2), min_length=0
)
# Returns df with a segment_id column appended

# Split at detected stops (within 50m for at least 2 minutes)
result = trucktrack.split_by_stops(
    df,
    max_diameter=50.0,
    min_duration=timedelta(minutes=2),
    min_length=0,
)
# Returns all rows with segment_id and is_stop columns.
# Movement segments shorter than min_length are filtered;
# stop segments are always kept.

# Process entirely in Rust (parquet in, parquet out)
trucktrack.split_by_observation_gap_file(
    "input.parquet", "output.parquet", timedelta(minutes=2)
)
trucktrack.split_by_stops_file(
    "input.parquet",
    "output.parquet",
    max_diameter=50.0,
    min_duration=timedelta(minutes=2),
)

# Add speed_mps derived column (zero-copy via pyo3-polars)
result = trucktrack.process_dataframe_in_rust(df)
# …or do the whole thing as parquet → parquet inside Rust
n_rows = trucktrack.process_parquet_in_rust("input.parquet", "output.parquet")

Python API — trucktrack.generate

from datetime import datetime, UTC
from trucktrack import TripConfig, generate_trace, traces_to_parquet

config = TripConfig(
    origin=(43.6532, -79.3832),       # (lat, lon)
    destination=(45.5017, -73.5673),
    departure_time=datetime.now(UTC),
    gps_noise_meters=3.0,
    seed=42,
    origin_maneuver="alley_dock",     # parking maneuver type
    destination_maneuver="alley_dock",
    valhalla_url="http://localhost:8002",
)

# Returns list[TracePoint] (lat, lon, speed_mph, heading, timestamp).
# Requires a running Valhalla instance at the configured URL.
points = generate_trace(config)

# Write one or many trips to a single parquet (columns: id, lat, lon,
# speed, heading, time — directly consumable by the splitters above).
traces_to_parquet([(points, config.trip_id)], "trip.parquet")

Error injection

Generate traces with realistic data-quality issues for testing downstream pipelines. Errors are configured per-trip via ErrorConfig and applied after trace synthesis:

from trucktrack.generate import ErrorConfig, default_error_profile

# Use the built-in profile: 19 error types at realistic probabilities
# (~1/1000 trips per type — tuned over 1000+ synthetic trips)
config = TripConfig(
    origin=(43.6532, -79.3832),
    destination=(45.5017, -73.5673),
    departure_time=datetime.now(UTC),
    valhalla_url="http://localhost:8002",
    errors=default_error_profile(),
)
points = generate_trace(config)

# Or pick specific errors:
config = TripConfig(
    ...,
    errors=[
        ErrorConfig("signal_dropout", probability=0.5, params={"gap_seconds": 30}),
        ErrorConfig("traffic_jam", probability=1.0),
    ],
)

Available error types:

Category Error type Description
GPS signal_dropout Remove points during signal loss
GPS cold_start_drift Initial position drift after gaps
GPS multipath Reflection-induced position spikes
GPS frozen_fix Position lock-up (repeated coordinates)
GPS timestamp_glitch Duplicate, jump-forward, or jump-backward timestamps
GPS coordinate_corruption Precision loss, lat/lon flip, or swap
GPS speed_heading_desync Speed/heading lag mismatch
GPS jitter_at_rest Brownian motion when stopped
Operational privacy_shutoff Entire trip segment removed
Operational relay_driving Multiple drivers with dwell compression
Operational yard_dwell Pre/post-trip parking
Operational fuel_rest_stop Mid-trip fuel/food break
Operational weigh_station_stop Commercial inspection with approach slowdown
Operational bobtail_segment Tractor-only faster speeds
Operational off_route_detour Temporary course deviation
Operational loading_dwell Cargo transfer dwell
Operational traffic_jam Highway congestion with optional full stops
Operational device_power_cycle GPS unit reboot with gap + drift
Operational geofence_gap Privacy-zone data removal

Python API — trucktrack.partition

from pathlib import Path
import polars as pl
from trucktrack import (
    assign_partitions,
    partition_existing_parquet,
    write_partitions,
    write_trips_partitioned,
)

# Easiest path: rewrite a flat parquet (id, lat, lon, …) as a hive
# dataset rooted at output_dir. Returns {tier_name: partition_count}.
summary = partition_existing_parquet(
    Path("tracks.parquet"), Path("partitioned/")
)

# Or, partition in-memory generated trips end-to-end:
summary = write_trips_partitioned(
    [(points, "trip-001")], Path("partitioned/")
)

# Single-call in-memory partitioning (adds tier, partition_id, hilbert_idx):
from trucktrack.partition import partition_points
partitioned_df = partition_points(points_df)

# Lower-level: classify trips yourself, then write.
metadata = pl.DataFrame({
    "id": ["trip-001"],
    "centroid_lat": [44.5],
    "centroid_lon": [-76.5],
    "bbox_diag_km": [540.0],
})
metadata = assign_partitions(metadata)  # adds tier, partition_id, hilbert_idx
write_partitions(metadata, points_df, Path("partitioned/"))

Tiers are assigned by trip bounding-box diagonal:

Tier Bbox diagonal Tile bucket
local < 100 km Valhalla L1 (1x1 deg)
regional 100-800 km Valhalla L0 (4x4 deg)
longhaul > 800 km Coarse 8x8 deg super-region

CLI

# Add derived columns (speed_mps), output CSV to stdout
trucktrack process tracks.parquet

# Split at observation gaps, write parquet
trucktrack split-gap tracks.parquet --gap 120 -o split.parquet

# Split at stops, output CSV
trucktrack split-stops tracks.parquet --diameter 50 --duration 120 \
    -o stops.csv --format csv

# Process / split commands support -o (file or '-' stdout), --format
# (csv/parquet), and --min-length for the splitters.
trucktrack split-gap tracks.parquet --gap 120 -o result.csv --format csv

# Synthesize a trip via Valhalla and write to parquet
trucktrack generate \
    --origin 43.6532,-79.3832 \
    --destination 45.5017,-73.5673 \
    --departure 2026-04-08T08:00:00 \
    --noise 3.0 --seed 42 \
    --valhalla-url http://localhost:8002 \
    -o trip.parquet

# Rewrite a flat parquet as a Valhalla-tile-aligned hive dataset
trucktrack partition tracks.parquet partitioned/

How it works

Path Description
Pure Rust split_by_observation_gap_file() / split_by_stops_file() read parquet, process, and write parquet entirely in Rust. No Python objects created.
Python <-> Rust split_by_observation_gap() / split_by_stops() share the Polars DataFrame with Rust via pyo3-polars (Arrow C Data Interface, zero-copy on column buffers). The Python polars, Rust polars, and pyo3-polars versions must be kept in sync.
Python only read_parquet() / read_dataset() use polars directly in Python. The generate and partition subpackages are also pure Python on top of Polars / PyArrow.

Project layout

src/
  lib.rs              # PyO3 module registration + splitter wrappers
  geo.rs              # Haversine distance (pure Rust math)
  partition.rs        # Tile classification, Hilbert indexing (Rust)
  transform.rs        # add_speed_mps, parquet helpers, error mapping
  splitters/
    gap.rs            # ObservationGapSplitter (Polars lazy expressions)
    stop.rs           # StopSplitter (Polars groupby + Rust sliding window)
python/trucktrack/
  __init__.py         # Public API re-exports
  io.py               # read_parquet, process_dataframe_in_rust, process_parquet_in_rust
  splitters.py        # split_by_observation_gap[_file], split_by_stops[_file]
  cli.py              # CLI entry point (argparse subcommands)
  _core.pyi           # Type stubs for the Rust extension
  generate/
    __init__.py       # Re-exports TripConfig, generate_trace, traces_to_*, ErrorConfig
    models.py         # TripConfig, TracePoint, RouteSegment, ErrorConfig dataclasses
    router.py         # Valhalla HTTP client
    interpolator.py   # Per-segment interpolation, bearing math
    speed_profile.py  # Per-edge speed sampling
    parking.py        # Origin/destination parking maneuvers
    noise.py          # Per-point GPS noise
    trace.py          # Orchestrator + traces_to_csv / traces_to_parquet
    random_trip.py    # Random origin/destination sampling helper
    gps_errors.py     # Physical GPS error injectors (8 types)
    operational_errors.py  # Operational pattern injectors (11 types)
  partition/
    __init__.py       # Re-exports assign_partitions, partition_points, write_*, etc.
    tiles.py          # Valhalla L0/L1 tile math, haversine
    classify.py       # Tier assignment + Hilbert-curve indexing (Polars)
    writer.py         # write_partitions, write_trips_partitioned,
                      #   partition_existing_parquet
  valhalla/
    __init__.py       # Re-exports route, map_match*, get_actor, DEFAULT_TRUCK_COSTING
    _actor.py         # pyvalhalla Actor singleton keyed by tile extract
    _parsing.py       # decode_polyline6, parse_valhalla_response -> RouteSegment
    routing.py        # route() — high-level truck routing
    map_matching.py   # map_match, map_match_ways, map_match_dataframe
  visualize/
    __init__.py       # Re-exports plot_trace, save_map
    _convert.py       # TracePoint/list -> Polars DataFrame adapters
    _map.py           # Folium map builder with per-segment coloring
data/
  example_tracks.parquet          # 10-row single vehicle example
  splitter_test_tracks.parquet    # 83-row, 3-vehicle test dataset

End-to-end: trip splitting + map-matching

The examples/end_to_end/ scripts form a three-stage pipeline: generate synthetic GPS data (or skip to use your own), split and spatially partition it, then map-match each trip to extract OSM way IDs. Set DATA to the directory containing your tile extract and outputs. The final result is a hive-partitioned parquet dataset with columns id, date, and way_id (null for trips that fail matching).

DATA=/home/jovyan/bmp-datavol-1

# Stage 1: generate synthetic GPS traces (skip if you have real data)
VALHALLA_TILE_EXTRACT=$DATA/valhalla_tiles.tar \
OUTPUT_DIR=$DATA/raw N_TRUCKS=10 K_TRIPS=10 \
    uv run python examples/end_to_end/stage1_generate.py

# Stage 2: gap-split, stop-split, traffic-filter, spatially partition
INPUT_DIR=$DATA/raw OUTPUT_DIR=$DATA/atri-trips \
    uv run python examples/end_to_end/stage2_split_partition.py

# Stage 3: map-match each trip → id, date, way_id
VALHALLA_TILE_EXTRACT=$DATA/valhalla_tiles.tar \
INPUT_DIR=$DATA/atri-trips OUTPUT_DIR=$DATA/atri-match \
    uv run python examples/end_to_end/stage3_map_match.py

Dev workflow

Task Command
Build maturin develop
Tests pytest tests/ -v
Lint Python ruff check python/ tests/
Format Python ruff format python/ tests/
Lint Rust cargo clippy --all-targets --all-features -- -D warnings
Format Rust cargo fmt --all
Type-check mypy python/trucktrack
Build wheel maturin build --release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trucktrack-0.1.5.tar.gz (145.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trucktrack-0.1.5-cp311-abi3-win_amd64.whl (14.2 MB view details)

Uploaded CPython 3.11+Windows x86-64

trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl (14.1 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file trucktrack-0.1.5.tar.gz.

File metadata

  • Download URL: trucktrack-0.1.5.tar.gz
  • Upload date:
  • Size: 145.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trucktrack-0.1.5.tar.gz
Algorithm Hash digest
SHA256 66e0105227894d7f431212b4732329280da94e808f2a01c10ea1702ea75f1a42
MD5 8281c0926e5d92948da75d03c0b48769
BLAKE2b-256 cc67b7ef4e85cabd9dac15b2b48f30fa6721f87e0ea0d55c7064a038b6b1b4b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for trucktrack-0.1.5.tar.gz:

Publisher: publish.yml on twedl/trucktrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trucktrack-0.1.5-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: trucktrack-0.1.5-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 14.2 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trucktrack-0.1.5-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 43ed0a81e430a353667fd74a5cab8ab220e8cf1768377f55c001006a82562951
MD5 0142b05cb3526d66fa0f695db3729b3f
BLAKE2b-256 47ad93cd92a5d953a095ab405c7a9fcdaa11de353df98686a7dd349f0f0e033e

See more details on using hashes here.

Provenance

The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-win_amd64.whl:

Publisher: publish.yml on twedl/trucktrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 74f664f68d30ca5b91e84b013e1af2600dbb14ce23a94cbf4ded0dfc22a24fe9
MD5 1b7a2b7ecc97584ca2d1ea3507460c98
BLAKE2b-256 44026e5feab71ecafe2d29eb89383ed49d5e56b22b4cb56680078dec64e749f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on twedl/trucktrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 335c0b0cfdd64d54a238ef2dab6f947a678b2e85b4b8a07fe571e9ac0ec32a9e
MD5 b764237b9306afed56558041c7ffeb85
BLAKE2b-256 7c0479a4722fad2255b2d5d87d25751be44a82de44b567e1532fe76fb7efcfa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on twedl/trucktrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page