High-performance trajectory splitting and analysis, powered by Rust
Project description
trucktrack
High-performance trajectory splitting, generation, and partitioning, powered by Rust.
A Python package implementing logic similar to
movingpandas trajectory splitters
(ObservationGapSplitter, StopSplitter) with a Rust backend for speed.
Data flows through Polars DataFrames, with the option
to process entirely in Rust (parquet in, parquet out) or share DataFrames
between Python and Rust zero-copy via pyo3-polars.
In addition to the Rust splitters, trucktrack ships two pure-Python subpackages:
trucktrack.generate— synthesize realistic truck GPS traces by routing through a Valhalla instance, interpolating along the route, layering parking maneuvers, adding GPS noise, and optionally injecting configurable data-quality errors (signal dropout, multipath, traffic jams, etc.).trucktrack.partition— rewrite a flat parquet of trips as a Valhalla-tile-aligned, hive-partitioned dataset (tier=…/partition_id=…/) with rows sorted by a Hilbert-curve index for spatial locality.
Install
pip install trucktrack
From source
# Requires Python 3.11+ and Rust stable
git clone https://github.com/twedl/trucktrack.git
cd trucktrack
python3 -m venv .venv && source .venv/bin/activate
pip install "maturin>=1.7,<2.0" polars pytest
maturin develop
Usage
Python API — splitters & I/O
from datetime import timedelta
import polars as pl
import trucktrack
df = pl.read_parquet("tracks.parquet")
# Expected columns: id, time, speed, heading, lat, lon
# Split at observation gaps > 2 minutes (column names are configurable)
result = trucktrack.split_by_observation_gap(
df, timedelta(minutes=2), min_length=0
)
# Returns df with a segment_id column appended
# Split at detected stops (within 50m for at least 2 minutes)
result = trucktrack.split_by_stops(
df,
max_diameter=50.0,
min_duration=timedelta(minutes=2),
min_length=0,
)
# Returns all rows with segment_id and is_stop columns.
# Movement segments shorter than min_length are filtered;
# stop segments are always kept.
# Process entirely in Rust (parquet in, parquet out)
trucktrack.split_by_observation_gap_file(
"input.parquet", "output.parquet", timedelta(minutes=2)
)
trucktrack.split_by_stops_file(
"input.parquet",
"output.parquet",
max_diameter=50.0,
min_duration=timedelta(minutes=2),
)
# Add speed_mps derived column (zero-copy via pyo3-polars)
result = trucktrack.process_dataframe_in_rust(df)
# …or do the whole thing as parquet → parquet inside Rust
n_rows = trucktrack.process_parquet_in_rust("input.parquet", "output.parquet")
Python API — trucktrack.generate
from datetime import datetime, UTC
from trucktrack import TripConfig, generate_trace, traces_to_parquet
config = TripConfig(
origin=(43.6532, -79.3832), # (lat, lon)
destination=(45.5017, -73.5673),
departure_time=datetime.now(UTC),
gps_noise_meters=3.0,
seed=42,
origin_maneuver="alley_dock", # parking maneuver type
destination_maneuver="alley_dock",
valhalla_url="http://localhost:8002",
)
# Returns list[TracePoint] (lat, lon, speed_mph, heading, timestamp).
# Requires a running Valhalla instance at the configured URL.
points = generate_trace(config)
# Write one or many trips to a single parquet (columns: id, lat, lon,
# speed, heading, time — directly consumable by the splitters above).
traces_to_parquet([(points, config.trip_id)], "trip.parquet")
Error injection
Generate traces with realistic data-quality issues for testing downstream
pipelines. Errors are configured per-trip via ErrorConfig and applied
after trace synthesis:
from trucktrack.generate import ErrorConfig, default_error_profile
# Use the built-in profile: 19 error types at realistic probabilities
# (~1/1000 trips per type — tuned over 1000+ synthetic trips)
config = TripConfig(
origin=(43.6532, -79.3832),
destination=(45.5017, -73.5673),
departure_time=datetime.now(UTC),
valhalla_url="http://localhost:8002",
errors=default_error_profile(),
)
points = generate_trace(config)
# Or pick specific errors:
config = TripConfig(
...,
errors=[
ErrorConfig("signal_dropout", probability=0.5, params={"gap_seconds": 30}),
ErrorConfig("traffic_jam", probability=1.0),
],
)
Available error types:
| Category | Error type | Description |
|---|---|---|
| GPS | signal_dropout |
Remove points during signal loss |
| GPS | cold_start_drift |
Initial position drift after gaps |
| GPS | multipath |
Reflection-induced position spikes |
| GPS | frozen_fix |
Position lock-up (repeated coordinates) |
| GPS | timestamp_glitch |
Duplicate, jump-forward, or jump-backward timestamps |
| GPS | coordinate_corruption |
Precision loss, lat/lon flip, or swap |
| GPS | speed_heading_desync |
Speed/heading lag mismatch |
| GPS | jitter_at_rest |
Brownian motion when stopped |
| Operational | privacy_shutoff |
Entire trip segment removed |
| Operational | relay_driving |
Multiple drivers with dwell compression |
| Operational | yard_dwell |
Pre/post-trip parking |
| Operational | fuel_rest_stop |
Mid-trip fuel/food break |
| Operational | weigh_station_stop |
Commercial inspection with approach slowdown |
| Operational | bobtail_segment |
Tractor-only faster speeds |
| Operational | off_route_detour |
Temporary course deviation |
| Operational | loading_dwell |
Cargo transfer dwell |
| Operational | traffic_jam |
Highway congestion with optional full stops |
| Operational | device_power_cycle |
GPS unit reboot with gap + drift |
| Operational | geofence_gap |
Privacy-zone data removal |
Python API — trucktrack.partition
from pathlib import Path
import polars as pl
from trucktrack import (
assign_partitions,
partition_existing_parquet,
write_partitions,
write_trips_partitioned,
)
# Easiest path: rewrite a flat parquet (id, lat, lon, …) as a hive
# dataset rooted at output_dir. Returns {tier_name: partition_count}.
summary = partition_existing_parquet(
Path("tracks.parquet"), Path("partitioned/")
)
# Or, partition in-memory generated trips end-to-end:
summary = write_trips_partitioned(
[(points, "trip-001")], Path("partitioned/")
)
# Single-call in-memory partitioning (adds tier, partition_id, hilbert_idx):
from trucktrack.partition import partition_points
partitioned_df = partition_points(points_df)
# Lower-level: classify trips yourself, then write.
metadata = pl.DataFrame({
"id": ["trip-001"],
"centroid_lat": [44.5],
"centroid_lon": [-76.5],
"bbox_diag_km": [540.0],
})
metadata = assign_partitions(metadata) # adds tier, partition_id, hilbert_idx
write_partitions(metadata, points_df, Path("partitioned/"))
Tiers are assigned by trip bounding-box diagonal:
| Tier | Bbox diagonal | Tile bucket |
|---|---|---|
local |
< 100 km | Valhalla L1 (1x1 deg) |
regional |
100-800 km | Valhalla L0 (4x4 deg) |
longhaul |
> 800 km | Coarse 8x8 deg super-region |
CLI
# Add derived columns (speed_mps), output CSV to stdout
trucktrack process tracks.parquet
# Split at observation gaps, write parquet
trucktrack split-gap tracks.parquet --gap 120 -o split.parquet
# Split at stops, output CSV
trucktrack split-stops tracks.parquet --diameter 50 --duration 120 \
-o stops.csv --format csv
# Process / split commands support -o (file or '-' stdout), --format
# (csv/parquet), and --min-length for the splitters.
trucktrack split-gap tracks.parquet --gap 120 -o result.csv --format csv
# Synthesize a trip via Valhalla and write to parquet
trucktrack generate \
--origin 43.6532,-79.3832 \
--destination 45.5017,-73.5673 \
--departure 2026-04-08T08:00:00 \
--noise 3.0 --seed 42 \
--valhalla-url http://localhost:8002 \
-o trip.parquet
# Rewrite a flat parquet as a Valhalla-tile-aligned hive dataset
trucktrack partition tracks.parquet partitioned/
How it works
| Path | Description |
|---|---|
| Pure Rust | split_by_observation_gap_file() / split_by_stops_file() read parquet, process, and write parquet entirely in Rust. No Python objects created. |
| Python <-> Rust | split_by_observation_gap() / split_by_stops() share the Polars DataFrame with Rust via pyo3-polars (Arrow C Data Interface, zero-copy on column buffers). The Python polars, Rust polars, and pyo3-polars versions must be kept in sync. |
| Python only | read_parquet() / read_dataset() use polars directly in Python. The generate and partition subpackages are also pure Python on top of Polars / PyArrow. |
Project layout
src/
lib.rs # PyO3 module registration + splitter wrappers
geo.rs # Haversine distance (pure Rust math)
partition.rs # Tile classification, Hilbert indexing (Rust)
transform.rs # add_speed_mps, parquet helpers, error mapping
splitters/
gap.rs # ObservationGapSplitter (Polars lazy expressions)
stop.rs # StopSplitter (Polars groupby + Rust sliding window)
python/trucktrack/
__init__.py # Public API re-exports
io.py # read_parquet, process_dataframe_in_rust, process_parquet_in_rust
splitters.py # split_by_observation_gap[_file], split_by_stops[_file]
cli.py # CLI entry point (argparse subcommands)
_core.pyi # Type stubs for the Rust extension
generate/
__init__.py # Re-exports TripConfig, generate_trace, traces_to_*, ErrorConfig
models.py # TripConfig, TracePoint, RouteSegment, ErrorConfig dataclasses
router.py # Valhalla HTTP client
interpolator.py # Per-segment interpolation, bearing math
speed_profile.py # Per-edge speed sampling
parking.py # Origin/destination parking maneuvers
noise.py # Per-point GPS noise
trace.py # Orchestrator + traces_to_csv / traces_to_parquet
random_trip.py # Random origin/destination sampling helper
gps_errors.py # Physical GPS error injectors (8 types)
operational_errors.py # Operational pattern injectors (11 types)
partition/
__init__.py # Re-exports assign_partitions, partition_points, write_*, etc.
tiles.py # Valhalla L0/L1 tile math, haversine
classify.py # Tier assignment + Hilbert-curve indexing (Polars)
writer.py # write_partitions, write_trips_partitioned,
# partition_existing_parquet
valhalla/
__init__.py # Re-exports route, map_match*, get_actor, DEFAULT_TRUCK_COSTING
_actor.py # pyvalhalla Actor singleton keyed by tile extract
_parsing.py # decode_polyline6, parse_valhalla_response -> RouteSegment
routing.py # route() — high-level truck routing
map_matching.py # map_match, map_match_ways, map_match_dataframe
visualize/
__init__.py # Re-exports plot_trace, save_map
_convert.py # TracePoint/list -> Polars DataFrame adapters
_map.py # Folium map builder with per-segment coloring
data/
example_tracks.parquet # 10-row single vehicle example
splitter_test_tracks.parquet # 83-row, 3-vehicle test dataset
End-to-end: trip splitting + map-matching
The examples/end_to_end/ scripts form a three-stage pipeline: generate
synthetic GPS data (or skip to use your own), split and spatially
partition it, then map-match each trip to extract OSM way IDs. Set DATA
to the directory containing your tile extract and outputs. The final
result is a hive-partitioned parquet dataset with columns id, date,
and way_id (null for trips that fail matching).
DATA=/home/jovyan/bmp-datavol-1
# Stage 1: generate synthetic GPS traces (skip if you have real data)
VALHALLA_TILE_EXTRACT=$DATA/valhalla_tiles.tar \
OUTPUT_DIR=$DATA/raw N_TRUCKS=10 K_TRIPS=10 \
uv run python examples/end_to_end/stage1_generate.py
# Stage 2: gap-split, stop-split, traffic-filter, spatially partition
INPUT_DIR=$DATA/raw OUTPUT_DIR=$DATA/atri-trips \
uv run python examples/end_to_end/stage2_split_partition.py
# Stage 3: map-match each trip → id, date, way_id
VALHALLA_TILE_EXTRACT=$DATA/valhalla_tiles.tar \
INPUT_DIR=$DATA/atri-trips OUTPUT_DIR=$DATA/atri-match \
uv run python examples/end_to_end/stage3_map_match.py
Dev workflow
| Task | Command |
|---|---|
| Build | maturin develop |
| Tests | pytest tests/ -v |
| Lint Python | ruff check python/ tests/ |
| Format Python | ruff format python/ tests/ |
| Lint Rust | cargo clippy --all-targets --all-features -- -D warnings |
| Format Rust | cargo fmt --all |
| Type-check | mypy python/trucktrack |
| Build wheel | maturin build --release |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trucktrack-0.1.5.tar.gz.
File metadata
- Download URL: trucktrack-0.1.5.tar.gz
- Upload date:
- Size: 145.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66e0105227894d7f431212b4732329280da94e808f2a01c10ea1702ea75f1a42
|
|
| MD5 |
8281c0926e5d92948da75d03c0b48769
|
|
| BLAKE2b-256 |
cc67b7ef4e85cabd9dac15b2b48f30fa6721f87e0ea0d55c7064a038b6b1b4b2
|
Provenance
The following attestation bundles were made for trucktrack-0.1.5.tar.gz:
Publisher:
publish.yml on twedl/trucktrack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trucktrack-0.1.5.tar.gz -
Subject digest:
66e0105227894d7f431212b4732329280da94e808f2a01c10ea1702ea75f1a42 - Sigstore transparency entry: 1280968927
- Sigstore integration time:
-
Permalink:
twedl/trucktrack@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/twedl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Trigger Event:
push
-
Statement type:
File details
Details for the file trucktrack-0.1.5-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: trucktrack-0.1.5-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 14.2 MB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43ed0a81e430a353667fd74a5cab8ab220e8cf1768377f55c001006a82562951
|
|
| MD5 |
0142b05cb3526d66fa0f695db3729b3f
|
|
| BLAKE2b-256 |
47ad93cd92a5d953a095ab405c7a9fcdaa11de353df98686a7dd349f0f0e033e
|
Provenance
The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-win_amd64.whl:
Publisher:
publish.yml on twedl/trucktrack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trucktrack-0.1.5-cp311-abi3-win_amd64.whl -
Subject digest:
43ed0a81e430a353667fd74a5cab8ab220e8cf1768377f55c001006a82562951 - Sigstore transparency entry: 1280968936
- Sigstore integration time:
-
Permalink:
twedl/trucktrack@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/twedl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Trigger Event:
push
-
Statement type:
File details
Details for the file trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 16.6 MB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74f664f68d30ca5b91e84b013e1af2600dbb14ce23a94cbf4ded0dfc22a24fe9
|
|
| MD5 |
1b7a2b7ecc97584ca2d1ea3507460c98
|
|
| BLAKE2b-256 |
44026e5feab71ecafe2d29eb89383ed49d5e56b22b4cb56680078dec64e749f0
|
Provenance
The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on twedl/trucktrack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trucktrack-0.1.5-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
74f664f68d30ca5b91e84b013e1af2600dbb14ce23a94cbf4ded0dfc22a24fe9 - Sigstore transparency entry: 1280968937
- Sigstore integration time:
-
Permalink:
twedl/trucktrack@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/twedl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Trigger Event:
push
-
Statement type:
File details
Details for the file trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 14.1 MB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
335c0b0cfdd64d54a238ef2dab6f947a678b2e85b4b8a07fe571e9ac0ec32a9e
|
|
| MD5 |
b764237b9306afed56558041c7ffeb85
|
|
| BLAKE2b-256 |
7c0479a4722fad2255b2d5d87d25751be44a82de44b567e1532fe76fb7efcfa0
|
Provenance
The following attestation bundles were made for trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on twedl/trucktrack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trucktrack-0.1.5-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
335c0b0cfdd64d54a238ef2dab6f947a678b2e85b4b8a07fe571e9ac0ec32a9e - Sigstore transparency entry: 1280968932
- Sigstore integration time:
-
Permalink:
twedl/trucktrack@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/twedl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa07b67b7518292c18a5bc1d582d8eeeee278af -
Trigger Event:
push
-
Statement type: