Probabilistic SLAM trajectory format and scoring tool
Project description
smfeval: score the belief, not just the mean
A SLAM filter reports a pose and a covariance. APE/RPE check the pose. smfeval checks whether the covariance is honest.
Illustration built in notebooks/figure_overconfidence.py, not smfeval
output. smfeval emits the text verdict below; the figure shows what that
verdict means geometrically. FAST-LIO2 on Oxford Spires christ-church-03. The estimate (blue) tracks the reference (black) to 3 cm APE, which is an excellent APE,
but the filter's reported 90% region is millimetres wide. The
reference lands about 11x outside even the filter's 90% region (the figure marks
this as 37 sigma). The belief is wrong where the mean is right, and that
per-pose gap is what smfeval scores. (Data: Oxford Spires, CC BY-NC-SA 4.0.)
Try it now
The notebook reproduces the headline verdict on one Oxford Spires sequence end to end (install, fetch data, verdict, NEES-vs-reference plot).
Install
pip install smfeval
The only dependencies are NumPy and SciPy (Python 3.10+).
Score a filter
$ smfeval nees estimate.SQUARE reference.tum --ref-body-frame lidar
median NEES 1.04e3 (calibrated: 2.37)
covariance scale gap k = 441, ~21x too tight per axis
90% coverage: 0.000 (calibrated: 0.900)
FAST-LIO2 on Oxford Spires christ-church-03. See
exporters/fast_lio2/VALIDATION.md for the full reproduction.
No
.SQUAREfile? smfeval needs a covariance for every pose, not just the poses, but it does not need the SQUARE format. If your filter outputs covariances, pass a plain TUM file plus a--covsidecar. If it does not, Your filter doesn't write SQUARE yet? shows how to get them.
Under a calibrated belief the per-pose translation NEES has a known reference
median of 2.37 (NEES is the error measured in standard deviations, squared).
The scale gap k = median NEES / 2.37 is the factor by which the published
covariance is too tight, so each axis is off by about a factor 21.
Here the filter's 90% credible ellipsoid never contains the reference. smfeval score goes further. It localizes the regime that is wrong (bulk vs tail) and
emits structured diagnoses with recommendations.
No reference? Run two filters and score them against each other
$ smfeval pair a.SQUARE b.SQUARE
matched 3101 pose pairs, scored 3101 (join 1.00, median gap 0.0 ms)
propriety caveat: pairwise scores are strictly proper only under a
honest reference sigma and independent errors; both violations push
conservative, so NEES_pair lower-bounds miscalibration.
pairwise median NEES 56.1 (calibrated: 2.37)
pairwise scale gap k >= 23.7, >=4.87x too tight per axis (lower bound)
verdict: optimistic (ANEES 71.7 vs chi2 interval [2.91, 3.09])
An elevated pairwise NEES certifies overconfidence with no reference consulted. Filter A is aligned to filter B directly and the difference is scored under the summed covariances. Common-mode error and an understated reference covariance both push the statistic down, so the verdict is a lower bound on the miscalibration.
Your filter doesn't write SQUARE yet?
smfeval needs your filter's per-pose covariance, not just its poses, but you do
not have to adopt the SQUARE format to provide it. Two on-ramps are documented
in SQUARE_spec.md.
- Wide TUM. Standard TUM pose columns plus the 21 row-major lower-triangle entries of the 6x6 tangent covariance (29 columns total).
- Sidecar file. Plain TUM poses plus
--cov cov.txtwithtimestamp c11 c21 c22 ... c66rows.
smfeval nees est.tum ref.tum --cov est.cov --est-body-frame imu --ref-body-frame imu
Most filters compute a covariance internally and never publish it. For four
popular LiDAR-inertial filters the export already exists.
exporters/ carries the audited few-line diff that makes
FAST-LIO2, Faster-LIO, Point-LIO, and I2EKF-LO publish their belief, each
with its pinned upstream commit, a bag-to-SQUARE converter, and a validation run
on a named public sequence. Contributions follow the PR template, with
smfeval validate --strict as the mechanical gate.
The full report
smfeval score est.SQUARE ref.tum produces the complete analysis.
=== smfeval scoring report ===
Synchronization
Mode: nearest
Pairs matched: 309 / 310
Dropped: 1
Timestamp gap (ms): median 0.04, p95 7.57, p99 8.55
Sync risk (v·Δt / σ): median 0.0099, p95 1.6363, p99 1.7755
[warning] 91 pairs (29.4%) exceed risk 0.3
Alignment
Gauge (declared): se3
Mode applied: se3 (6 DoF)
Fitted Δxyz: (-27.8424, 24.9711, 5.6239) m
Fit residual (m): median 0.0094, p95 0.0262
6 DoF removed over 32 m of trajectory
Scores
Translation CRPS: mean 0.004 m [95% CI 0.003, 0.006] (n=309)
median 0.003, std 0.003, min 0.001, max 0.014
block length (Politis–White): 24.4
Energy score: mean 0.009 m [95% CI 0.006, 0.011] (n=309)
median 0.006, std 0.006, min 0.002, max 0.028
block length (Politis–White): 24.6
Log score (translation): mean -8.017 [95% CI -10.644, -5.103] (n=309)
median -10.892, std 7.064, min -13.701, max 19.477
block length (Politis–White): 24.3
Interval score: mean 0.057 [95% CI 0.021, 0.095] (n=309)
median 0.010, std 0.092, min 0.008, max 0.402
block length (Politis–White): 24.3
Calibration
PIT uniformity (KS): p = 0.000 [warning] possible miscalibration
90% Mahalanobis coverage: 55.0% (nominal 90.0%)
Translation z-score: mean 1.63, std 1.02
Diagnoses (attribution → action)
[warning] sync_risk
A competing confounder: timestamp-matching error shrinks short-window Σ_rel the same way local over-confidence does.
· 29.4% of pairs exceed sync risk 0.3
→ Re-score with --sync=interpolate_ref to separate sync from a genuine calibration fault before trusting short-horizon verdicts.
Recommendations
- 29.4% of pairs have sync risk > 0.3; consider cross-checking with --sync=interpolate_ref to confirm calibration findings.
- 6 DoF removed over 32 m of trajectory; post-alignment residuals are biased low. Consider --n_to_align to fit on a prefix and score on the remainder.
- Coverage below nominal combined with KS p < 0.05 — the filter is over-confident (claimed Σ too tight, reference falls outside the predicted intervals); widen process noise. Miscalibration is unlikely to be explained by sync error alone.
Point-LIO on Oxford Spires christ-church-03, reproduced from
tests/fixtures/regression/real_point_lio. The report is built from:
- synchronization and alignment diagnostics;
- translation proper scoring rules (CRPS, energy score, Gaussian log score with its exact calibration/sharpness split, interval score), each with a stationary-bootstrap confidence interval;
- PIT/coverage calibration and windowed relative-pose calibration
(
--rpe-window); - track-frame bias/variance attribution;
- structured failure-mode diagnoses with recommended actions.
Only translation is scored, not orientation: a proper score on SO(3) needs a belief density whose normaliser is intractable for the natural rotation families, so rotation is left to future work (see docs/metrics.rst).
smfeval score --json prints the structured report to stdout, and --json-out
writes it to a file. Both follow docs/report.schema.json.
Why several scores? Each proper rule touches a different part of the predictive
translation distribution (bulk shape, tails, a chosen coverage level), so
no single number suffices. docs/metrics.rst explains
every metric and how to read it. SQUARE_spec.md documents the format and
conventions.
Commands
| Verb | What it does |
|---|---|
smfeval nees est ref |
three-line calibration verdict (median NEES, scale gap k, coverage) |
smfeval pair a b |
no-reference pairwise verdict (lower bound on miscalibration) |
smfeval score est ref |
full scoring report (--json/--json-out for machines) |
smfeval validate file |
header/row sanity checks (--strict is the exporter gate) |
Development
uv sync && uv run pytest
Docs live under docs/ (make docs). The test suite includes
property-based invariants (hypothesis) and seeded Monte Carlo power tests of the
verdict machinery itself (see tests/test_power.py).
Citation
If you use smfeval, please cite the software. GitHub's Cite this repository
button reads CITATION.cff.
Rønning, O. smfeval: probabilistic SLAM trajectory scoring. 2026. https://github.com/svendbot/smfeval
A paper describing the methodology and the audit behind it is in preparation,
to be released with slam_benchmark.
Provenance
smfeval grew out of a systematic audit of uncertainty calibration in
LiDAR-inertial odometry.
slam_benchmark is the audit that
motivated this tool. The trajectory data used in fixtures and the notebook
derives from the
Oxford Spires Dataset
(CC BY-NC-SA 4.0; see the data license notes in those directories).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smfeval-0.4.0.tar.gz.
File metadata
- Download URL: smfeval-0.4.0.tar.gz
- Upload date:
- Size: 109.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59c5ad2eed52e494992750f0a63db4ec9fcced6ffeeaf5cd16befc59b8c23996
|
|
| MD5 |
0a583bac81c7d6394f60c1192d8328a9
|
|
| BLAKE2b-256 |
a1fe3820d6afeb3b07f04d77e47d0cb888f204f396a033d556ba75d574e51c97
|
Provenance
The following attestation bundles were made for smfeval-0.4.0.tar.gz:
Publisher:
publish.yml on svendbot/smfeval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smfeval-0.4.0.tar.gz -
Subject digest:
59c5ad2eed52e494992750f0a63db4ec9fcced6ffeeaf5cd16befc59b8c23996 - Sigstore transparency entry: 1902156234
- Sigstore integration time:
-
Permalink:
svendbot/smfeval@2e546d7c4a66b8c06758163f5aa776840eb751d7 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/svendbot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2e546d7c4a66b8c06758163f5aa776840eb751d7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file smfeval-0.4.0-py3-none-any.whl.
File metadata
- Download URL: smfeval-0.4.0-py3-none-any.whl
- Upload date:
- Size: 83.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
486ec7e43810f0bb614dafc0f41c270d0213c6a7df6f9409146711a2b4568535
|
|
| MD5 |
ba73ea4ac7d9cc6a5b5da81ba919d2a5
|
|
| BLAKE2b-256 |
cef7520931f9df43bcbcbd79c005ff711297dbad527a550941c5dec96a68bc81
|
Provenance
The following attestation bundles were made for smfeval-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on svendbot/smfeval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smfeval-0.4.0-py3-none-any.whl -
Subject digest:
486ec7e43810f0bb614dafc0f41c270d0213c6a7df6f9409146711a2b4568535 - Sigstore transparency entry: 1902156323
- Sigstore integration time:
-
Permalink:
svendbot/smfeval@2e546d7c4a66b8c06758163f5aa776840eb751d7 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/svendbot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2e546d7c4a66b8c06758163f5aa776840eb751d7 -
Trigger Event:
push
-
Statement type: