Skip to main content

High-performance transit data parser with TXC to GTFS conversion

Project description

Transit Parser

High-performance Python+Rust library for parsing transit data formats with TXC to GTFS conversion.

Features

  • GTFS Static - Parse and write GTFS feeds (CSV-based)
  • TransXChange (TXC) - Parse UK XML transit format
  • TXC to GTFS - Convert TransXChange to GTFS
  • Generic CSV/JSON - Parse any CSV/JSON with schema inference

Installation

Prerequisites

  • Python 3.9+
  • Rust 1.75+ (with cargo)
  • uv (recommended) or pip

Development Setup

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and enter directory
cd parser

# Create virtual environment and install in dev mode
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Build and install with maturin
uv pip install maturin
maturin develop

# Or use pip directly
pip install maturin
maturin develop

Building for Release

maturin build --release

Usage

Parse GTFS Feed

from transit_parser import GtfsFeed

# From ZIP file
feed = GtfsFeed.from_zip("path/to/gtfs.zip")

# From directory
feed = GtfsFeed.from_path("path/to/gtfs/")

# Access data
print(f"Agencies: {len(feed.agencies)}")
print(f"Routes: {len(feed.routes)}")
print(f"Stops: {len(feed.stops)}")
print(f"Trips: {len(feed.trips)}")

# Write to ZIP
feed.to_zip("output.zip")

Parse TransXChange

from transit_parser import TxcDocument

# From file
doc = TxcDocument.from_path("path/to/file.xml")

# From string
doc = TxcDocument.from_string(xml_string)

# Inspect document
print(f"Schema version: {doc.schema_version}")
print(f"Operators: {doc.operator_count}")
print(f"Services: {doc.service_count}")
print(f"Vehicle journeys: {doc.vehicle_journey_count}")

Convert TXC to GTFS

from transit_parser import TxcDocument, TxcToGtfsConverter, ConversionOptions

# Parse TXC
doc = TxcDocument.from_path("input.xml")

# Configure conversion
options = ConversionOptions(
    include_shapes=True,
    region="england",  # For bank holiday handling
    calendar_start="2024-01-01",
    calendar_end="2024-12-31",
)

# Convert
converter = TxcToGtfsConverter(options)
result = converter.convert(doc)

# Check results
print(f"Converted {result.stats.trips_converted} trips")
print(f"Warnings: {len(result.warnings)}")

# Save GTFS
result.feed.to_zip("output.zip")

Batch Conversion

from pathlib import Path
from transit_parser import TxcDocument, TxcToGtfsConverter

# Parse multiple TXC files
docs = []
for xml_file in Path("txc_files/").glob("*.xml"):
    docs.append(TxcDocument.from_path(str(xml_file)))

# Convert all to single GTFS
converter = TxcToGtfsConverter()
result = converter.convert_batch(docs)
result.feed.to_zip("combined.zip")

Generic CSV Parsing

from transit_parser import CsvDocument

# Parse with automatic type inference
doc = CsvDocument.from_path("data.csv")

print(f"Columns: {doc.columns}")
print(f"Rows: {len(doc)}")

# Access rows as dicts
for row in doc.rows:
    print(row)

JSON Parsing

from transit_parser import JsonDocument

# Parse JSON
doc = JsonDocument.from_path("data.json")

# Access root value
data = doc.root

# Use JSON pointer for nested access
value = doc.pointer("/data/items/0/name")

Project Structure

parser/
├── pyproject.toml          # Python project config (maturin backend)
├── Cargo.toml              # Rust workspace root
├── rust/
│   ├── transit-core/       # Core data models and traits
│   ├── gtfs-parser/        # GTFS Static parser
│   ├── txc-parser/         # TransXChange parser
│   ├── txc-gtfs-adapter/   # TXC→GTFS conversion
│   ├── csv-parser/         # Generic CSV parser
│   ├── json-parser/        # Generic JSON parser
│   └── transit-bindings/   # PyO3 Python bindings
└── python/
    └── transit_parser/     # Python package

Performance

The Rust core provides high performance for:

  • Streaming XML parsing - Process large TXC files without loading entire DOM
  • Zero-copy CSV parsing - Efficient GTFS file reading
  • Parallel processing - Batch conversion uses multiple cores
  • GIL release - Python can do other work during long operations

License

MIT OR Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transit_parser-0.1.0.tar.gz (64.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

transit_parser-0.1.0-cp39-abi3-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.9+Windows x86-64

transit_parser-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

transit_parser-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file transit_parser-0.1.0.tar.gz.

File metadata

  • Download URL: transit_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for transit_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2b4da2304c05a818e22eaddd297c15a9587318337fe1174a8fc5261d55adb284
MD5 aa93092322acec6e12f19c2a27674928
BLAKE2b-256 5213db99ed1132612502cfb9ef0d244d7c9dc12c7e7da53ac5a3de675d01c3eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.1.0.tar.gz:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for transit_parser-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cc750ee4abc8bc9ae2e497d252e5411cc82b225409d628c4d73d13496a40e9e6
MD5 4f7c811cf4999211e9ab8b620212a55b
BLAKE2b-256 a7b207f1455061793074e654a190830a2f5358764aa9702e2ebb0f20d9c21267

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for transit_parser-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 822548b1c845742e83020b4eda149a6bbb78670d6a71a579a341a31b54a4e0c2
MD5 c601032967a50412bb1e2ab1a0eb6074
BLAKE2b-256 6d6c8e5f23d50071f6d71e79123673a44b901c7965afae4a772f78e3523270ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for transit_parser-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6063f17122b04af42873b8b3c24dfc36c3b7ecfe55f8e18adc5eedbe67abdd4f
MD5 e4d7c46a2f0febf8c559dce1dc6b92f7
BLAKE2b-256 8c9fdc77a02903a6f99cba677701a105a77bbf5ca8f2d451564249c4294ddb93

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page