Skip to main content

Validate and clean Two-Line Element (TLE) satellite-tracking files

Project description

lintle

A validator and cleaner for Two-Line Element (TLE) corpus files exported from space-track.org.

It audits a TLE file against the standardized TLE specification, repairs the systematic export defects, and emits a uniform, de-defected corpus that any SGP4 / orbital-mechanics library can ingest directly. Records it cannot safely repair are quarantined — never silently mangled — into a per-file sidecar detailed enough to file a defect report with space-track.


What problem it solves

A TLE record is two fixed-width lines, each exactly 69 ASCII columns, with a mod-10 checksum in column 69. Bulk historical exports from space-track carry two systematic, era-specific defects:

  • Trailing \ artifact — almost every Line 1 has an extra \ byte appended before the newline.
  • Missing checksum digit — many records were exported without their column-69 checksum, leaving 68-column lines.

These appear independently and in combination, and a small fraction of records are genuinely corrupt (garbled columns, orphaned lines, wrong lengths). lintle distinguishes the safely-repairable from the genuinely-corrupt and treats each correctly.

How it works

One validator, used two ways. A single module (tle.py) defines what a "perfect" TLE record is — column layout, semantic ranges, and the mod-10 checksum. The validate command reports defects against that definition; the clean command reuses the exact same validator and only emits records that pass it.

The validated-transformation principle. The cleaner never applies a fix and hopes. It applies a candidate fix, then re-runs full validation on the result, and commits the fix only if it now passes. Consequently the cleaner cannot turn a bad record into a wrong-but-valid-looking one, and every line in the output is valid by construction.

Five fix classes, in decreasing order of safety:

Class Examples Action
Content-preserving trailing \, CRLF, trailing whitespace auto-fix (checksum survives as an independent check)
Reconstructed-checksum a record exported without its column-69 digit recompute the checksum from intact columns 1–68
Content-shifting leading whitespace / BOM trim, then re-validate; quarantine if it fails
Structural blank / whitespace-only lines drop, resynchronise pairing
Corrupt bad checksum, wrong length, orphan line, garbled columns quarantine

Streaming and parallel. Files are read in binary, line by line, in constant memory — a 3 GB file never loads into RAM. Records are paired by a prefix-driven state machine that resynchronises on every 1 line, so one missing line cannot cascade into mispaired records. Each input file is processed in its own worker process.

Requirements

  • Python 3.11+
  • uv for environment and dependency management

lintle itself has no runtime dependencies — it is pure standard library. sgp4 is a dev-only dependency, used as a test oracle.

Installation

uv sync

This creates the virtual environment and installs the dev dependencies. No build step is needed to run the tool.

Usage

The console script is lintle, with two subcommands:

# Audit only — report defects, write nothing
uv run lintle validate [paths...]

# Produce cleaned output + quarantine sidecars
uv run lintle clean [paths...]

python -m lintle ... is equivalent to uv run lintle ....

Arguments and options:

Option Default Meaning
paths data/source Files or directories. A directory is globbed for tle*.txt (tool output *.cleaned.txt / *.broken.txt is excluded).
--out-dir DIR data/output Where clean writes its output. Created if absent.
--jobs N CPU count Number of files processed in parallel. Lower it if a slow disk causes I/O contention.
--report text|json text Summary format.

Examples:

# Validate the whole corpus
uv run lintle validate data/source

# Clean one file
uv run lintle clean data/source/tle2022.txt --out-dir data/output

# Clean the corpus, capture a machine-readable summary
uv run lintle clean data/source --report json > run-summary.json

Exit codes:

Code Meaning
0 No records quarantined — clean (or every defect repaired).
1 At least one record was quarantined.
2 Operational error — no input files, disk shortfall, or a file that failed to process.

Repairable defects (including the near-universal trailing \) do not raise the exit code above 0 — almost every raw file contains them.

Output

A clean run lays --out-dir out like this:

<out-dir>/
├── cleaned/                tleYYYY.cleaned.txt   — one per input file
├── broken/                 tleYYYY.broken.txt    — one per input file
├── broken-noradids.ndjson  — corpus-wide list of quarantined NORAD IDs
└── report.md               — corpus-wide run report
  • cleaned/tleYYYY.cleaned.txt — standard 2-line TLE text, every record verified valid: 69 ASCII columns per line, \n-terminated, matching satellite catalog numbers, valid checksums. World-readable, ready for downstream ingestion.

  • broken/tleYYYY.broken.txt — the quarantine sidecar. Each entry records the source line number(s), a human-readable reason, and the offending line(s) copied byte-faithfully. The header carries totals, a timestamp, and the tool version — formatted to paste into a space-track defect report.

  • broken-noradids.ndjson — newline-delimited JSON, one {"noradId":N} object per line, listing every NORAD catalog number whose records were quarantined anywhere in the run, deduplicated and sorted ascending. Records whose line 1 is itself unreadable are omitted — there's no catalog number to recover. Intended for programmatic downstream consumers (e.g. a satellite catalog flagging archive gaps) that want the affected IDs without parsing broken/*.txt. The schema is deliberately minimal; future releases may extend each record with additional fields, which consumers can ignore safely. Empty file when nothing was quarantined.

  • report.md — a Markdown run report aggregating the whole run: corpus totals, the percentage cleaned and quarantined, corpus-wide fix counts, the defect-category breakdown, and a per-file table.

A run summary is also printed per file to stdout (and as JSON with --report json):

tle2022.txt   8,412,067 records   8,412,064 clean   3 quarantined
  fixes:   trailing-backslash 8,412,064 | reconstructed-checksum 195,293
  rejects: checksum-mismatch 1 | orphan-line 1 | wrong-length 1

reconstructed-checksum is reported separately from content-preserving fixes: those records are format-conformant, but their checksums are computed, not independently verified.

validate writes nothing — it only prints the per-file summary and the locations of defective records to stdout.

Progress

A 30 GB run is not silent. Live progress is written to stderr as it goes — so it never pollutes the stdout summary or a --report json pipe:

processing 29 file(s) with 10 worker(s)...
  tle2004_7of8.txt: 5,000,000 records...
[3/29] tle2004_3of8.txt — 2,527,820 clean, 183 quarantined

A worker emits a record-count line every 1,000,000 records; the main process prints an [k/N] line as each file finishes.

Results on the bundled corpus

A full run over the 29-file corpus (tle2004tle2025, ~232 million records):

  • 99.96 % cleaned — 187.9 M trailing-\ artifacts stripped, 71.3 M missing checksums reconstructed
  • 0.044 % quarantined (103,228 records) as genuinely corrupt — every reject fell into an anticipated category; no unknown defect type surfaced

Development

uv sync                          # install dev dependencies
uv run pytest                    # run the test suite
uv run pytest --cov=lintle       # with a coverage report
uv run ruff check                # lint
uv run ruff format               # auto-format

The suite includes unit tests per module, an asymmetric cross-check against the trusted sgp4 parser (a known-good TLE must be accepted by both), and end-to-end integration tests (golden output, idempotence, re-validation).

Code quality is enforced with ruff (lint rule sets E, F, I, UP, B, SIM; 88-column lines) and coverage is measured with pytest-cov.

Project layout

src/lintle/
  tle.py        # core: defines a "perfect" TLE record (pure, no I/O)
  repair.py     # speculative, validated repairs
  pipeline.py   # streaming reader, prefix-driven pairing, per-file routing
  report.py     # quarantine sidecar + run-summary rendering
  cli.py        # argument parsing, parallelism, exit codes
tests/          # pytest suite
docs/superpowers/
  specs/        # the design specification
  plans/        # the implementation plan
  runs/         # corpus-run summaries

Further reading

The full design rationale — the defect model, the TLE column specification, the fix policy, and the architecture — is in docs/superpowers/specs/2026-05-21-tle-corpus-cleaner-design.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lintle-0.2.0.tar.gz (135.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lintle-0.2.0-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file lintle-0.2.0.tar.gz.

File metadata

  • Download URL: lintle-0.2.0.tar.gz
  • Upload date:
  • Size: 135.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintle-0.2.0.tar.gz
Algorithm Hash digest
SHA256 127a66ecdb578d0455f80298c666cc593320bf9fad3f0fcbf049decffe5fb46d
MD5 5cb92e2a53932d2cbe5f3f6be12527bc
BLAKE2b-256 766e8d5bf6efa8f32750e253ba9ff3f16fbfb7e15e7315d028eca9dbd05d9b37

See more details on using hashes here.

File details

Details for the file lintle-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lintle-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintle-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4a0dc6452f9a65c55543203efbdc56e8479d5bc0ed5b15833dcdcee5de548ff
MD5 58bcd717bac929f608b2ca5d448d24d0
BLAKE2b-256 14ac9d7191a7e39509d4ee41733175eec8061cb2ea374e81ba436045cfec9b45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page