CityGML parsing and STEP conversion toolkit extracted from Paper-CAD
Project description
gml2step
A standalone toolkit for parsing CityGML files and converting 3D building geometry to the STEP (ISO 10303-21) CAD format. Originally extracted from Paper-CAD.
Table of Contents
- Overview
- Installation
- Quick Start
- CLI Reference
- Conversion Methods
- Processing Pipeline
- LoD Support
- Streaming Parser
- CRS and Coordinate Handling
- PLATEAU Integration
- Architecture
- Development
- License
Overview
gml2step reads CityGML 2.0 files — including large-scale datasets from Japan's PLATEAU project — and produces STEP files suitable for CAD/CAM/BIM workflows.
Key capabilities:
- CityGML parsing with streaming support for files of any size
- STEP conversion via OpenCASCADE with automatic LoD fallback (LoD3 -> LoD2 -> LoD1 -> LoD0)
- 4 conversion methods: solid, sew, extrude, and auto (tries all in sequence)
- 7-phase geometry pipeline with progressive auto-repair
- PLATEAU data fetching via public APIs (MLIT Data Catalog + OSM Nominatim)
- Footprint extraction for 2D analysis without requiring OCCT
- CRS auto-detection with built-in support for all 19 Japan Plane Rectangular CS zones
Installation
Core (parsing, footprint extraction)
pip install gml2step
With PLATEAU integration
pip install "gml2step[plateau]"
With STEP conversion (requires OpenCASCADE)
STEP conversion depends on pythonocc-core, which is not reliably pip-installable on all platforms. Use conda or Docker:
# conda
conda install -c conda-forge pythonocc-core
pip install gml2step
# Docker (recommended for full conversion)
docker build -t gml2step .
docker run --rm -v $(pwd):/data gml2step convert /data/input.gml /data/output.step
Note: Parsing, streaming, and footprint extraction work without OCCT. Only the
convertcommand requires it.
Quick Start
CLI
# Parse a CityGML file and print summary as JSON
gml2step parse ./input.gml
# Stream-parse buildings one at a time (constant memory)
gml2step stream-parse ./input.gml --limit 100
# Extract 2D footprints with height estimates
gml2step extract-footprints ./input.gml --output-json ./footprints.json
# Convert CityGML to STEP
gml2step convert ./input.gml ./output.step --method solid
Python API
from gml2step import parse, stream_parse, extract_footprints, convert
# Lightweight summary (no OCCT required)
summary = parse("input.gml")
print(summary["total_buildings"])
print(summary["detected_source_crs"])
# Stream buildings with constant memory usage
for building, xlink_index in stream_parse("input.gml", limit=10):
bid = building.get("{http://www.opengis.net/gml}id")
print(bid)
# Extract 2D footprints with height
footprints = extract_footprints("input.gml", limit=100)
for fp in footprints:
print(fp.building_id, fp.height, len(fp.exterior))
# Full CityGML -> STEP conversion
ok, result = convert("input.gml", "output.step", method="auto")
CLI Reference
gml2step convert
gml2step convert INPUT_GML OUTPUT_STEP [OPTIONS]
| Option | Default | Description |
|---|---|---|
--limit N |
None | Maximum number of buildings to convert |
--method |
solid |
Conversion method: solid, sew, extrude, auto |
--debug |
False | Enable debug logging |
--use-streaming / --no-use-streaming |
True | Use streaming parser for lower memory usage |
gml2step parse
gml2step parse INPUT_GML [--limit N]
Outputs a JSON summary with detected CRS, building count, and building IDs.
gml2step stream-parse
gml2step stream-parse INPUT_GML [--limit N] [--building-id ID ...] [--filter-attribute gml:id]
Streams building IDs one per line using constant memory. Supports filtering by building ID.
gml2step extract-footprints
gml2step extract-footprints INPUT_GML [--output-json PATH] [--limit N] [--default-height 10.0]
Extracts 2D footprints with estimated building heights. Height is derived from measuredHeight, Z-coordinate range, or the specified default.
Conversion Methods
| Method | Description |
|---|---|
| solid | Primary method. Extracts LoD surfaces, builds shells, validates solids, auto-repairs. Best for LoD2/LoD3. |
| sew | Collects WallSurface/RoofSurface/GroundSurface polygons, sews faces, and attempts to form a solid. |
| extrude | Extrudes LoD0 footprint to estimated height. Fallback for files with only 2D data. |
| auto | Tries solid -> sew -> extrude in sequence until one succeeds. |
Processing Pipeline
The convert command processes each building through 7 phases:
| Phase | Description |
|---|---|
| 0. Recentering | Translates coordinates near the origin for OCCT numerical stability |
| 1. LoD Selection | Selects the best available LoD (LoD3 -> LoD2 -> LoD1 fallback) |
| 1.5. CRS Detection | Auto-detects source CRS and reprojects if needed |
| 2. Geometry Extraction | Extracts faces using the selected conversion method |
| 3. Shell Construction | Builds OCCT shells from faces with multi-pass sewing |
| 4. Solid Validation | Validates geometry and constructs solids |
| 5. Auto-Repair | 4-level progressive repair: minimal -> standard -> aggressive -> ultra |
| 6. Part Merging | Fuses BuildingParts via Boolean union (with compound fallback) |
| 7. STEP Export | Writes AP214CD STEP file with millimeter units |
Precision Modes
The precision_mode parameter controls coordinate tolerance:
| Mode | Relative tolerance | Use case |
|---|---|---|
standard |
0.01% | General use |
high |
0.001% | Detailed models |
maximum |
0.0001% | High-precision CAD |
ultra |
0.00001% | Maximum fidelity |
Shape Fix Levels
The shape_fix_level parameter controls auto-repair aggressiveness. When repair fails at the specified level, it automatically escalates:
- minimal — ShapeFix_Solid only
- standard — + ShapeUpgrade_UnifySameDomain
- aggressive — + Rebuild with relaxed tolerance
- ultra — + ShapeFix_Shape (full repair)
LoD Support
gml2step supports CityGML Level of Detail 0 through 3:
| LoD | Description | Surfaces supported |
|---|---|---|
| LoD3 | Architectural detail models | lod3Solid, lod3MultiSurface, lod3Geometry |
| LoD2 | Standard building models (PLATEAU primary) | lod2Solid, lod2MultiSurface, lod2Geometry, boundedBy |
| LoD1 | Simple block models | lod1Solid |
| LoD0 | 2D footprints | lod0FootPrint, lod0RoofEdge, GroundSurface |
All 6 CityGML 2.0 boundary surface types are recognized: WallSurface, RoofSurface, GroundSurface, OuterCeilingSurface, OuterFloorSurface, ClosureSurface.
Streaming Parser
For large CityGML files (common in PLATEAU datasets), gml2step provides a SAX-style streaming parser that processes one building at a time instead of loading the entire DOM tree into memory:
- O(1 building) memory vs O(entire file) for DOM parsing
- Two-tier XLink resolution cache (local per-building + global LRU)
- Optional NumPy-accelerated coordinate parsing
Note: The streaming parser has not been formally benchmarked. Memory savings and speedup depend heavily on file size and building complexity. The theoretical advantage is that memory usage stays roughly constant regardless of file size, while DOM parsing scales linearly with file size.
for building, xlinks in stream_parse("large_plateau_file.gml"):
process(building)
CRS and Coordinate Handling
- Auto-detection of source CRS from GML
srsNameattributes - All 19 Japan Plane Rectangular CS zones (EPSG:6669–6687) with automatic zone selection by latitude/longitude
- Automatic reprojection from geographic CRS (WGS84, JGD2000, JGD2011) to an appropriate projected CRS
- Coordinate recentering near the origin to prevent floating-point precision loss in OCCT
PLATEAU Integration
PLATEAU is a project by Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT) that provides open 3D city models for the entire country in CityGML format.
gml2step provides optional convenience functions for fetching PLATEAU data (pip install "gml2step[plateau]"). Under the hood, this is a thin wrapper around two public APIs:
- PLATEAU Data Catalog API (operated by MLIT) — queried for CityGML file URLs by mesh code or municipality
- Nominatim (OpenStreetMap) — used for geocoding Japanese addresses to latitude/longitude
There is no custom backend server. All requests go directly to these public endpoints.
What it does
- Address search: Takes a Japanese address (e.g., "東京都千代田区霞が関3-2-1"), geocodes it via Nominatim, converts the coordinates to a JIS X 0410 mesh code, fetches CityGML files covering that area from the PLATEAU API, then parses and ranks the buildings by distance/name similarity.
- Mesh code lookup: Given a mesh code, fetches CityGML file URLs from the PLATEAU API and downloads them.
- Building ID lookup: Given a specific building ID and mesh code, fetches and parses just the relevant 1km grid area.
Building Search
from gml2step.plateau.fetcher import search_buildings_by_address
buildings = search_buildings_by_address(
"東京都千代田区霞が関3-2-1",
ranking_mode="hybrid", # "distance", "name", or "hybrid"
limit=10,
)
for b in buildings:
print(b.building_id, b.name, b.height, b.lod_level)
Mesh Code Utilities
PLATEAU data is organized by JIS X 0410 standard mesh codes. gml2step provides conversion functions for all 5 mesh levels:
from gml2step.plateau.mesh_utils import (
latlon_to_mesh_1st, # 80km grid (4-digit)
latlon_to_mesh_2nd, # 10km grid (6-digit)
latlon_to_mesh_3rd, # 1km grid (8-digit)
latlon_to_mesh_half, # 500m grid (9-digit)
latlon_to_mesh_quarter # 250m grid (10-digit)
)
mesh = latlon_to_mesh_3rd(35.6812, 139.7671) # Tokyo Station
Async API Client
import asyncio
from gml2step.plateau.api_client import fetch_plateau_datasets_by_mesh
# Fetch PLATEAU dataset URLs by mesh code
result = asyncio.run(fetch_plateau_datasets_by_mesh("53394525"))
Other Features
- Geocoding via Nominatim (rate-limited to 1 req/sec per Nominatim policy), with Japan-specific validation and relevance scoring
- Building ranking with 3 modes: distance, name similarity (Levenshtein + token matching), hybrid
- JIS X 0410 mesh code conversion (1st through quarter mesh)
- Neighboring mesh enumeration (3x3 grid) for boundary searches
- Async batch resolution of mesh codes with concurrency control
- Local CityGML caching (opt-in via
CITYGML_CACHE_ENABLED/CITYGML_CACHE_DIRenv vars) - Offline mesh-to-municipality mapping included as package data (avoids extra API calls)
Architecture
src/gml2step/
├── __init__.py # Public API: convert, parse, stream_parse, extract_footprints
├── api.py # API implementation
├── cli.py # Typer CLI
├── coordinate_utils.py # CRS utilities, Japan zone definitions
├── data/
│ └── mesh2_municipality.json # Nationwide mesh-to-municipality mapping
├── citygml/
│ ├── core/ # Types, constants, CityGML namespaces
│ ├── parsers/ # Coordinate and polygon extraction
│ ├── streaming/ # SAX-style streaming parser, XLink cache, coordinate optimizer
│ ├── lod/ # LoD0–LoD3 extraction strategies, footprint extractor
│ ├── geometry/ # OCCT geometry builders, shell/solid construction, auto-repair
│ ├── transforms/ # CRS detection, reprojection, recentering
│ ├── utils/ # XLink resolver, XML parser, logging
│ └── pipeline/ # Orchestrator (7-phase conversion pipeline)
└── plateau/ # PLATEAU API client, geocoding, mesh utilities, building search
Development
git clone https://github.com/Soynyuu/gml2step.git
cd gml2step
pip install -e ".[dev,plateau]"
pytest
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-or-later).
gml2step was originally developed as part of Paper-CAD and extracted as a standalone library. See NOTICE for full attribution.
Acknowledgments
- Paper-CAD — The parent project from which gml2step was extracted
- PLATEAU — Japan's national 3D city model project (MLIT)
- OpenCASCADE / pythonocc-core — 3D CAD kernel for STEP conversion
- pyproj — Coordinate reference system transformations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gml2step-0.1.0.tar.gz.
File metadata
- Download URL: gml2step-0.1.0.tar.gz
- Upload date:
- Size: 156.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88b5b7faafbec50ea88987ea55a3132c0fcc02901a181e376ee75bc927eccc85
|
|
| MD5 |
6e2ffa53aded3e3f9ca1d84bbbaaa49d
|
|
| BLAKE2b-256 |
c0ccf529aa481f005d4b19f494dceee21a2d0fb2f119b43917646bd67d9145ee
|
Provenance
The following attestation bundles were made for gml2step-0.1.0.tar.gz:
Publisher:
publish.yml on Soynyuu/gml2step
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gml2step-0.1.0.tar.gz -
Subject digest:
88b5b7faafbec50ea88987ea55a3132c0fcc02901a181e376ee75bc927eccc85 - Sigstore transparency entry: 928376198
- Sigstore integration time:
-
Permalink:
Soynyuu/gml2step@3a1bcbe7e68e900cd2b6dfcba636b6a7212bfa05 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Soynyuu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3a1bcbe7e68e900cd2b6dfcba636b6a7212bfa05 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gml2step-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gml2step-0.1.0-py3-none-any.whl
- Upload date:
- Size: 174.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80bf9d6a39213de25f068e10a6f83bcffabd07b715d4d035f213c512724354d1
|
|
| MD5 |
2aac8cd1f99f1ba7d18286ad1098dfa3
|
|
| BLAKE2b-256 |
01021d99be151c88b1b9a66c324849c6cd0b340270cfbb4d8ed15f91bd68f077
|
Provenance
The following attestation bundles were made for gml2step-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Soynyuu/gml2step
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gml2step-0.1.0-py3-none-any.whl -
Subject digest:
80bf9d6a39213de25f068e10a6f83bcffabd07b715d4d035f213c512724354d1 - Sigstore transparency entry: 928376205
- Sigstore integration time:
-
Permalink:
Soynyuu/gml2step@3a1bcbe7e68e900cd2b6dfcba636b6a7212bfa05 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Soynyuu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3a1bcbe7e68e900cd2b6dfcba636b6a7212bfa05 -
Trigger Event:
push
-
Statement type: