Skip to main content

CityGML parsing and STEP conversion toolkit extracted from Paper-CAD

Project description

gml2step

License: AGPL v3 Python 3.10+

日本語版 README はこちら (Japanese)

A standalone toolkit for parsing CityGML files and converting 3D building geometry to the STEP (ISO 10303-21) CAD format. Originally extracted from Paper-CAD.

Table of Contents

Overview

gml2step reads CityGML 2.0 files — including large-scale datasets from Japan's PLATEAU project — and produces STEP files suitable for CAD/CAM/BIM workflows.

Key capabilities:

  • CityGML parsing with streaming support for files of any size
  • STEP conversion via OpenCASCADE with automatic LoD fallback (LoD3 -> LoD2 -> LoD1 -> LoD0)
  • 4 conversion methods: solid, sew, extrude, and auto (tries all in sequence)
  • 7-phase geometry pipeline with progressive auto-repair
  • PLATEAU data fetching via public APIs (MLIT Data Catalog + OSM Nominatim)
  • Footprint extraction for 2D analysis without requiring OCCT
  • CRS auto-detection with built-in support for all 19 Japan Plane Rectangular CS zones

Installation

Core (parsing, footprint extraction)

pip install gml2step

With PLATEAU integration

pip install "gml2step[plateau]"

With STEP conversion (requires OpenCASCADE)

STEP conversion depends on pythonocc-core, which is not reliably pip-installable on all platforms. Use conda or Docker:

# conda
conda install -c conda-forge pythonocc-core
pip install gml2step

# Docker (recommended for full conversion)
docker build -t gml2step .
docker run --rm -v $(pwd):/data gml2step convert /data/input.gml /data/output.step

Note: Parsing, streaming, and footprint extraction work without OCCT. Only the convert command requires it.

Quick Start

CLI

# Parse a CityGML file and print summary as JSON
gml2step parse ./input.gml

# Stream-parse buildings one at a time (constant memory)
gml2step stream-parse ./input.gml --limit 100

# Extract 2D footprints with height estimates
gml2step extract-footprints ./input.gml --output-json ./footprints.json

# Convert CityGML to STEP
gml2step convert ./input.gml ./output.step --method solid

Python API

from gml2step import parse, stream_parse, extract_footprints, convert

# Lightweight summary (no OCCT required)
summary = parse("input.gml")
print(summary["total_buildings"])
print(summary["detected_source_crs"])

# Stream buildings with constant memory usage
for building, xlink_index in stream_parse("input.gml", limit=10):
    bid = building.get("{http://www.opengis.net/gml}id")
    print(bid)

# Extract 2D footprints with height
footprints = extract_footprints("input.gml", limit=100)
for fp in footprints:
    print(fp.building_id, fp.height, len(fp.exterior))

# Full CityGML -> STEP conversion
ok, result = convert("input.gml", "output.step", method="auto")

CLI Reference

gml2step convert

gml2step convert INPUT_GML OUTPUT_STEP [OPTIONS]
Option Default Description
--limit N None Maximum number of buildings to convert
--method solid Conversion method: solid, sew, extrude, auto
--debug False Enable debug logging
--use-streaming / --no-use-streaming True Use streaming parser for lower memory usage

gml2step parse

gml2step parse INPUT_GML [--limit N]

Outputs a JSON summary with detected CRS, building count, and building IDs.

gml2step stream-parse

gml2step stream-parse INPUT_GML [--limit N] [--building-id ID ...] [--filter-attribute gml:id]

Streams building IDs one per line using constant memory. Supports filtering by building ID.

gml2step extract-footprints

gml2step extract-footprints INPUT_GML [--output-json PATH] [--limit N] [--default-height 10.0]

Extracts 2D footprints with estimated building heights. Height is derived from measuredHeight, Z-coordinate range, or the specified default.

Conversion Methods

Method Description
solid Primary method. Extracts LoD surfaces, builds shells, validates solids, auto-repairs. Best for LoD2/LoD3.
sew Collects WallSurface/RoofSurface/GroundSurface polygons, sews faces, and attempts to form a solid.
extrude Extrudes LoD0 footprint to estimated height. Fallback for files with only 2D data.
auto Tries solid -> sew -> extrude in sequence until one succeeds.

Processing Pipeline

The convert command processes each building through 7 phases:

Phase Description
0. Recentering Translates coordinates near the origin for OCCT numerical stability
1. LoD Selection Selects the best available LoD (LoD3 -> LoD2 -> LoD1 fallback)
1.5. CRS Detection Auto-detects source CRS and reprojects if needed
2. Geometry Extraction Extracts faces using the selected conversion method
3. Shell Construction Builds OCCT shells from faces with multi-pass sewing
4. Solid Validation Validates geometry and constructs solids
5. Auto-Repair 4-level progressive repair: minimal -> standard -> aggressive -> ultra
6. Part Merging Fuses BuildingParts via Boolean union (with compound fallback)
7. STEP Export Writes AP214CD STEP file with millimeter units

Precision Modes

The precision_mode parameter controls coordinate tolerance:

Mode Relative tolerance Use case
standard 0.01% General use
high 0.001% Detailed models
maximum 0.0001% High-precision CAD
ultra 0.00001% Maximum fidelity

Shape Fix Levels

The shape_fix_level parameter controls auto-repair aggressiveness. When repair fails at the specified level, it automatically escalates:

  1. minimal — ShapeFix_Solid only
  2. standard — + ShapeUpgrade_UnifySameDomain
  3. aggressive — + Rebuild with relaxed tolerance
  4. ultra — + ShapeFix_Shape (full repair)

LoD Support

gml2step supports CityGML Level of Detail 0 through 3:

LoD Description Surfaces supported
LoD3 Architectural detail models lod3Solid, lod3MultiSurface, lod3Geometry
LoD2 Standard building models (PLATEAU primary) lod2Solid, lod2MultiSurface, lod2Geometry, boundedBy
LoD1 Simple block models lod1Solid
LoD0 2D footprints lod0FootPrint, lod0RoofEdge, GroundSurface

All 6 CityGML 2.0 boundary surface types are recognized: WallSurface, RoofSurface, GroundSurface, OuterCeilingSurface, OuterFloorSurface, ClosureSurface.

Streaming Parser

For large CityGML files (common in PLATEAU datasets), gml2step provides a SAX-style streaming parser that processes one building at a time instead of loading the entire DOM tree into memory:

  • O(1 building) memory vs O(entire file) for DOM parsing
  • Two-tier XLink resolution cache (local per-building + global LRU)
  • Optional NumPy-accelerated coordinate parsing

Note: The streaming parser has not been formally benchmarked. Memory savings and speedup depend heavily on file size and building complexity. The theoretical advantage is that memory usage stays roughly constant regardless of file size, while DOM parsing scales linearly with file size.

for building, xlinks in stream_parse("large_plateau_file.gml"):
    process(building)

CRS and Coordinate Handling

  • Auto-detection of source CRS from GML srsName attributes
  • All 19 Japan Plane Rectangular CS zones (EPSG:6669–6687) with automatic zone selection by latitude/longitude
  • Automatic reprojection from geographic CRS (WGS84, JGD2000, JGD2011) to an appropriate projected CRS
  • Coordinate recentering near the origin to prevent floating-point precision loss in OCCT

PLATEAU Integration

PLATEAU is a project by Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT) that provides open 3D city models for the entire country in CityGML format.

gml2step provides optional convenience functions for fetching PLATEAU data (pip install "gml2step[plateau]"). Under the hood, this is a thin wrapper around two public APIs:

  • PLATEAU Data Catalog API (operated by MLIT) — queried for CityGML file URLs by mesh code or municipality
  • Nominatim (OpenStreetMap) — used for geocoding Japanese addresses to latitude/longitude

There is no custom backend server. All requests go directly to these public endpoints.

What it does

  1. Address search: Takes a Japanese address (e.g., "東京都千代田区霞が関3-2-1"), geocodes it via Nominatim, converts the coordinates to a JIS X 0410 mesh code, fetches CityGML files covering that area from the PLATEAU API, then parses and ranks the buildings by distance/name similarity.
  2. Mesh code lookup: Given a mesh code, fetches CityGML file URLs from the PLATEAU API and downloads them.
  3. Building ID lookup: Given a specific building ID and mesh code, fetches and parses just the relevant 1km grid area.

Building Search

from gml2step.plateau.fetcher import search_buildings_by_address

buildings = search_buildings_by_address(
    "東京都千代田区霞が関3-2-1",
    ranking_mode="hybrid",  # "distance", "name", or "hybrid"
    limit=10,
)
for b in buildings:
    print(b.building_id, b.name, b.height, b.lod_level)

Mesh Code Utilities

PLATEAU data is organized by JIS X 0410 standard mesh codes. gml2step provides conversion functions for all 5 mesh levels:

from gml2step.plateau.mesh_utils import (
    latlon_to_mesh_1st,    # 80km grid (4-digit)
    latlon_to_mesh_2nd,    # 10km grid (6-digit)
    latlon_to_mesh_3rd,    # 1km grid (8-digit)
    latlon_to_mesh_half,   # 500m grid (9-digit)
    latlon_to_mesh_quarter # 250m grid (10-digit)
)

mesh = latlon_to_mesh_3rd(35.6812, 139.7671)  # Tokyo Station

Async API Client

import asyncio
from gml2step.plateau.api_client import fetch_plateau_datasets_by_mesh

# Fetch PLATEAU dataset URLs by mesh code
result = asyncio.run(fetch_plateau_datasets_by_mesh("53394525"))

Other Features

  • Geocoding via Nominatim (rate-limited to 1 req/sec per Nominatim policy), with Japan-specific validation and relevance scoring
  • Building ranking with 3 modes: distance, name similarity (Levenshtein + token matching), hybrid
  • JIS X 0410 mesh code conversion (1st through quarter mesh)
  • Neighboring mesh enumeration (3x3 grid) for boundary searches
  • Async batch resolution of mesh codes with concurrency control
  • Local CityGML caching (opt-in via CITYGML_CACHE_ENABLED / CITYGML_CACHE_DIR env vars)
  • Offline mesh-to-municipality mapping included as package data (avoids extra API calls)

Architecture

src/gml2step/
├── __init__.py              # Public API: convert, parse, stream_parse, extract_footprints
├── api.py                   # API implementation
├── cli.py                   # Typer CLI
├── coordinate_utils.py      # CRS utilities, Japan zone definitions
├── data/
│   └── mesh2_municipality.json  # Nationwide mesh-to-municipality mapping
├── citygml/
│   ├── core/                # Types, constants, CityGML namespaces
│   ├── parsers/             # Coordinate and polygon extraction
│   ├── streaming/           # SAX-style streaming parser, XLink cache, coordinate optimizer
│   ├── lod/                 # LoD0–LoD3 extraction strategies, footprint extractor
│   ├── geometry/            # OCCT geometry builders, shell/solid construction, auto-repair
│   ├── transforms/          # CRS detection, reprojection, recentering
│   ├── utils/               # XLink resolver, XML parser, logging
│   └── pipeline/            # Orchestrator (7-phase conversion pipeline)
└── plateau/                 # PLATEAU API client, geocoding, mesh utilities, building search

Development

git clone https://github.com/Soynyuu/gml2step.git
cd gml2step
pip install -e ".[dev,plateau]"
pytest

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-or-later).

gml2step was originally developed as part of Paper-CAD and extracted as a standalone library. See NOTICE for full attribution.

Acknowledgments

  • Paper-CAD — The parent project from which gml2step was extracted
  • PLATEAU — Japan's national 3D city model project (MLIT)
  • OpenCASCADE / pythonocc-core — 3D CAD kernel for STEP conversion
  • pyproj — Coordinate reference system transformations
  • Mitou Junior — A program supporting creators aged 17 and under with original ideas and outstanding technical skills

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gml2step-0.1.1.tar.gz (164.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gml2step-0.1.1-py3-none-any.whl (174.9 kB view details)

Uploaded Python 3

File details

Details for the file gml2step-0.1.1.tar.gz.

File metadata

  • Download URL: gml2step-0.1.1.tar.gz
  • Upload date:
  • Size: 164.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gml2step-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6d05e976f33837e4391d060da4b02eef642031bb06d61b47c65ab5172123adeb
MD5 edf410d13275b9e483db2d79f7453101
BLAKE2b-256 09f5d7412086c4aa79d6db8f01e57b813b5742e2bb0ceb2de64a2a970b49e455

See more details on using hashes here.

Provenance

The following attestation bundles were made for gml2step-0.1.1.tar.gz:

Publisher: publish.yml on Soynyuu/gml2step

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gml2step-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gml2step-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 174.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gml2step-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6bcb0748150bcd73076528a06984933e5451ecd8ce5ebaedb521635a73f64b17
MD5 0bf51bca222b70a42be67a9161ec2740
BLAKE2b-256 781a39832f29300d95a54e5808885da9aa608bd5486330ad4ec516974fc6e28c

See more details on using hashes here.

Provenance

The following attestation bundles were made for gml2step-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Soynyuu/gml2step

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page