IGN LiDAR HD Dataset Processing Library for Building LOD Classification
Project description
A comprehensive Python library for processing IGN (Institut National de l'Information Gรฉographique et Forestiรจre) LiDAR HD data into machine learning-ready datasets for Building Level of Detail (LOD) classification tasks.
โจ What's New in v1.6.5
๐ฏ Artefact-Free Geometric Features - Comprehensive audit validates radius-based search eliminates LIDAR scan artefacts
๐ Full Documentation Suite - Complete audit reports, guides, and validation tests
โ๏ธ Radius Parameter Support - CLI and pipeline configuration for manual control or auto-estimation
โ
Production Validated - All features mathematically independent, no cross-contamination detected
๐ฌ Scientific Accuracy - ~10-15% slower but eliminates "dash line" artefacts completely
๐ Artefact Audit | ๐ฏ Radius Guide | โ Full Report
Previous Updates (v1.6.4)
๐บ Enhanced Documentation - Updated README with embedded YouTube player for better video experience
๐จ Improved Presentation - Better visual integration of demo content
๐ง Minor Updates - Documentation improvements and refinements
Previous Updates (v1.6.2)
๐ง Critical GPU Feature Fix - Corrected eigenvalue normalization in GPU implementation (now matches CPU)
๐ก๏ธ Robust Feature Computation - Added degenerate case filtering and outlier-resistant curvature
๐ฏ Enhanced Quality - Radius search support and comprehensive validation suite
โ ๏ธ Breaking Change: GPU feature values changed for users of GPU acceleration
๐ Analysis & Fixes | โ Implementation
Previous Updates (v1.6.0)
๐ฏ Enhanced Data Augmentation - Augmentation now happens during ENRICH phase (before feature computation) for better feature-geometry consistency
๐จ RGB CloudCompare Fix - Perfect RGB color display with corrected 16-bit scaling
โก 40% Processing Trade-off - Slightly longer processing time but significantly better training data quality
๐ Improved Documentation - Comprehensive guides and examples for all features
๐ Full Release Notes | ๐ Migration Guide
๐บ Video Demo
๐ Project Overview
This library transforms raw IGN LiDAR HD point clouds into structured datasets ready for machine learning applications. Built specifically for building classification tasks, it handles the complete pipeline from data acquisition to training-ready patches.
๐ Processing Workflow
flowchart TD
A[IGN LiDAR HD Data] --> B[Download Tiles]
B --> C[Enrich with Features]
C --> D[Create Training Patches]
D --> E[ML-Ready Dataset]
B --> B1[Smart Skip Detection]
C --> C1[GPU/CPU Processing]
C --> C2[Geometric Features]
C --> C3[Data Augmentation]
D --> D1[LOD Classification]
style A fill:#e1f5fe
style E fill:#e8f5e8
style B1 fill:#fff3e0
style C1 fill:#fff3e0
style C3 fill:#fff3e0
๐ Project Stats:
- ๐๏ธ 14 core modules - Comprehensive processing toolkit
- ๐ 10 example scripts - From basic usage to advanced workflows
- ๐งช Comprehensive test suite - Ensuring reliability and performance
- ๐ 50+ curated tiles - Covering diverse French territories
- โก GPU & CPU support - Flexible computation backends
- ๐ Smart resumability - Never reprocess existing data
๐ Quick Start
Installation
# Standard installation (CPU only)
pip install ign-lidar-hd
# With RGB augmentation support
pip install ign-lidar-hd[rgb]
# With all non-GPU features
pip install ign-lidar-hd[all]
# GPU acceleration (requires NVIDIA GPU + CUDA)
# Install base package first, then add GPU support separately:
pip install ign-lidar-hd
pip install cupy-cuda11x # For CUDA 11.x
# OR
pip install cupy-cuda12x # For CUDA 12.x
# Advanced GPU with RAPIDS cuML (best performance, conda recommended)
pip install ign-lidar-hd
pip install cupy-cuda12x # Match your CUDA version
conda install -c rapidsai -c conda-forge -c nvidia cuml
# Or via pip (may require more configuration):
# pip install cuml-cu11 # For CUDA 11.x
# pip install cuml-cu12 # For CUDA 12.x
GPU Requirements (optional):
- NVIDIA GPU with CUDA support
- CUDA Toolkit 11.0 or higher
- CuPy package matching your CUDA version
- Optional: RAPIDS cuML for advanced GPU-accelerated algorithms
- Expected speedup: 5-6x faster than CPU (CuPy), up to 10x with RAPIDS
๐ GPU Documentation:
- ๐ Complete GPU Guide - Full documentation
- ๐ Quick Start - Get started in 30 seconds
- ๐ Performance Benchmarks - Expected speedups
Basic Usage
from ign_lidar import LiDARProcessor
# Initialize processor
processor = LiDARProcessor(lod_level="LOD2")
# Process a single tile
patches = processor.process_tile("data.laz", "output/")
# Process multiple files
patches = processor.process_directory("data/", "output/", num_workers=4)
Command Line Interface
# Download tiles
ign-lidar-hd download --bbox -2.0,47.0,-1.0,48.0 --output tiles/ --max-tiles 10
# Enrich LAZ files with geometric features
ign-lidar-hd enrich --input-dir tiles/ --output enriched/ --num-workers 4
# Enrich with geometric features
ign-lidar-hd enrich --input-dir tiles/ --output enriched/
# Enrich with GPU acceleration (requires CuPy)
# Automatically falls back to CPU if GPU unavailable
ign-lidar-hd enrich --input-dir tiles/ --output enriched/ --use-gpu
# Enrich with RGB augmentation from IGN orthophotos
ign-lidar-hd enrich --input-dir tiles/ --output enriched/ --add-rgb --rgb-cache-dir cache/
# Create training patches
ign-lidar-hd patch --input-dir enriched/ --output patches/ --lod-level LOD2
# ๐ Run complete workflow with YAML configuration
ign-lidar-hd pipeline config.yaml
๐ Pipeline Configuration (Recommended)
Use YAML configuration files for reproducible workflows:
# Create example configuration
ign-lidar-hd pipeline my_config.yaml --create-example full
# Edit configuration (my_config.yaml)
# Then run complete pipeline
ign-lidar-hd pipeline my_config.yaml
Example YAML configuration:
global:
num_workers: 4
download:
bbox: "2.3, 48.8, 2.4, 48.9"
output: "data/raw"
max_tiles: 10
enrich:
input_dir: "data/raw"
output: "data/enriched"
mode: "building"
add_rgb: true
rgb_cache_dir: "cache/orthophotos"
use_gpu: true
patch:
input_dir: "data/enriched"
output: "data/patches"
lod_level: "LOD2"
num_points: 16384
augment: true
Benefits:
- โ Reproducible - Version control your workflows
- โ Declarative - Define what you want, not how
- โ Flexible - Run only the stages you need
- โ Shareable - Easy team collaboration
๐ Key Features
๐๏ธ Core Processing Capabilities
- LiDAR-only processing: Pure geometric analysis without RGB dependencies
- RGB augmentation: Optional color enrichment from IGN BD ORTHOยฎ orthophotos
- Multi-level classification: Support for LOD2 (15 classes) and LOD3 (30+ classes)
- Rich feature extraction: Surface normals, curvature, planarity, verticality, local density
- Architectural style inference: Automatic building style classification
- Patch-based processing: Configurable 150m ร 150m patches with overlap control
โก Performance & Optimization
- GPU acceleration: CUDA-accelerated feature computation with automatic CPU fallback
- Parallel processing: Multi-worker support with automatic CPU core detection
- Memory optimization: Chunked processing for large datasets
- Smart skip detection: โญ๏ธ Automatically skip existing files and resume interrupted workflows
- Batch operations: Process hundreds of tiles efficiently
- ๐ Improved augmentation: Features computed on augmented geometry for consistency
๐ง Workflow Automation
- Pipeline configuration: ๐ YAML-based declarative workflows for reproducibility
- Integrated downloader: IGN WFS tile discovery and batch downloading
- Format flexibility: Choose between LAZ 1.4 (full features) or QGIS-compatible output
- ๐ Enhanced augmentation: Geometric transformations applied before feature computation for better data quality
- Unified CLI: Single
ign-lidar-hdcommand with intuitive subcommands - Idempotent operations: Safe to restart - never reprocesses existing data
๐ Geographic Intelligence
- Strategic locations: Pre-configured urban, coastal, and rural area processing
- Bounding box filtering: Spatial subsetting for targeted analysis
- Coordinate system handling: Automatic Lambert93 to WGS84 transformations
- Tile management: Curated collection of 50+ test tiles across France
๐๏ธ Library Architecture
๐ฏ Component Architecture
graph TB
subgraph "Core Processing"
P[processor.py<br/>๐ง Main Engine]
F[features.py<br/>โก Feature Extraction]
GPU[features_gpu.py<br/>๐ฅ๏ธ GPU Acceleration]
end
subgraph "Data Management"
D[downloader.py<br/>๐ฅ IGN WFS Integration]
TL[tile_list.py<br/>๐ Tile Management]
SL[strategic_locations.py<br/>๏ฟฝ๏ธ Geographic Zones]
MD[metadata.py<br/>๐ Dataset Metadata]
end
subgraph "Classification & Styles"
C[classes.py<br/>๐ข LOD2/LOD3 Schemas]
AS[architectural_styles.py<br/>๐จ Style Inference]
end
subgraph "Integration & Config"
CLI[cli.py<br/>๐ฑ๏ธ Command Interface]
CFG[config.py<br/>โ๏ธ Configuration]
QGIS[qgis_converter.py<br/>๐ QGIS Compatibility]
U[utils.py<br/>๐ ๏ธ Core Utilities]
end
CLI --> P
CLI --> D
P --> F
P --> GPU
P --> C
F --> AS
D --> TL
D --> SL
P --> MD
style P fill:#e3f2fd
style F fill:#e8f5e8
style D fill:#fff3e0
style CLI fill:#f3e5f5
๐ Module Responsibilities
| Module | Purpose | Key Features |
|---|---|---|
๐ง processor.py |
Main processing engine | Patch creation, LOD classification, workflow orchestration |
๐ฅ downloader.py |
IGN WFS integration | Tile discovery, batch download, smart skip detection |
โก features.py |
Feature extraction | Normals, curvature, geometric properties |
๏ฟฝ๏ธ features_gpu.py |
GPU acceleration | CUDA-optimized feature computation |
๐ข classes.py |
Classification schemas | LOD2/LOD3 building taxonomies |
๐จ architectural_styles.py |
Style inference | Building architecture classification |
๐ Example Workflows
examples/
โโโ ๐ basic_usage.py # Getting started
โโโ ๐๏ธ example_urban_simple.py # Urban processing
โโโ โก parallel_processing_example.py # Performance
โโโ ๐ full_workflow_example.py # End-to-end pipeline
โโโ ๐จ multistyle_processing.py # Architecture analysis
โโโ ๐ง pytorch_dataloader.py # ML integration
โโโ ๐ pipeline_example.py # YAML pipeline usage
โโโ ๐ enrich_with_rgb.py # RGB augmentation
โโโ workflows/ # Production pipelines
config_examples/
โโโ ๐ pipeline_full.yaml # Complete workflow
โโโ ๐ pipeline_enrich.yaml # Enrich-only
โโโ ๐ pipeline_patch.yaml # Patch-only
โ๏ธ CLI Commands
The package provides a unified ign-lidar-hd command with four subcommands:
๐ CLI Workflow Chain
sequenceDiagram
participant User
participant CLI as ign-lidar-hd
participant D as Downloader
participant E as Enricher
participant P as Processor
User->>CLI: download --bbox ...
CLI->>D: Initialize downloader
D->>D: Fetch available tiles
D->>D: Smart skip check
D-->>CLI: Downloaded tiles
CLI-->>User: โ Tiles ready
User->>CLI: enrich --input-dir ...
CLI->>E: Initialize enricher
E->>E: Compute geometric features
E->>E: Optional RGB augmentation
E->>E: GPU/CPU processing
E-->>CLI: Enriched LAZ files
CLI-->>User: โ Features computed
User->>CLI: patch --input-dir ...
CLI->>P: Initialize processor
P->>P: Create training patches
P->>P: Apply augmentations
P-->>CLI: ML-ready dataset
CLI-->>User: โ Dataset ready
Note over User,CLI: ๐ Or use pipeline command
User->>CLI: pipeline config.yaml
CLI->>CLI: Load YAML config
CLI->>D: Execute download stage
CLI->>E: Execute enrich stage
CLI->>P: Execute patch stage
CLI-->>User: โ Complete workflow
๐ Pipeline Command (Recommended)
Execute complete workflows using YAML configuration:
# Create example configuration
ign-lidar-hd pipeline my_config.yaml --create-example full
# Run configured pipeline
ign-lidar-hd pipeline my_config.yaml
See Pipeline Configuration Guide for detailed examples.
Download Command
Download LiDAR tiles from IGN:
ign-lidar-hd download \
--bbox lon_min,lat_min,lon_max,lat_max \
--output tiles/ \
--max-tiles 50
Enrich Command
Enrich LAZ files with geometric features and optional RGB:
# CPU version (automatically skips existing enriched files)
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--num-workers 4 \
--k-neighbors 10
# ๐ With radius parameter (eliminates LIDAR scan artefacts)
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--mode building \
--radius 1.5 # Manual radius in meters (or omit for auto-estimate)
# ๐ With RGB augmentation from IGN orthophotos
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--add-rgb \
--rgb-cache-dir cache/orthophotos
# Force re-enrichment (ignore existing files)
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--force
# GPU version (requires CUDA)
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--use-gpu
๐ก Smart Skip: By default, the enrich command skips files that have already been enriched, making it safe to resume interrupted operations.
๐ฏ Artefact-Free Features: Use
--radiusparameter for scientifically accurate geometric features. Auto-estimation (default) eliminates LIDAR scan line artefacts. See Radius Parameter Guide for details.
Patch Command
Create training patches from enriched LAZ files:
# Automatically skips tiles with existing patches
ign-lidar-hd patch \
--input-dir enriched/ \
--output patches/ \
--lod-level LOD2 \
--patch-size 150.0 \
--num-workers 4 \
--num-augmentations 3
# Force reprocessing (ignore existing patches)
ign-lidar-hd patch \
--input-dir enriched/ \
--output patches/ \
--force
๐ก Smart Skip: The patch command automatically detects existing patches and skips reprocessing, allowing you to resume interrupted batch jobs.
๐ง Configuration
LOD Levels
- LOD2: Simplified building models (15 classes)
- LOD3: Detailed building models (30 classes)
Processing Options
processor = LiDARProcessor(
lod_level="LOD2", # LOD2 or LOD3
augment=True, # ๐ Enable enhanced augmentation
num_augmentations=3, # Augmentations per tile (not per patch!)
patch_size=150.0, # Patch size in meters
patch_overlap=0.1, # 10% overlap
bbox=[xmin, ymin, xmax, ymax] # Spatial filter
)
๐ v1.6.0: Data augmentation now happens during the ENRICH phase (before feature computation) instead of the PATCH phase. This ensures geometric features (normals, curvature, planarity) are computed on augmented geometry for better feature-geometry consistency and improved model training quality. See
AUGMENTATION_IMPROVEMENT.mdfor details.
๐ Output Format
๐ Data Structure Overview
graph TB
subgraph "Raw Input"
LAZ[LAZ Point Cloud<br/>XYZ + Intensity<br/>Classification]
end
subgraph "Enriched Data"
ELAZ[Enhanced LAZ<br/>+ 30 Features<br/>+ Building Labels]
end
subgraph "ML Dataset"
NPZ[NPZ Patches<br/>16K points each<br/>Ready for Training]
end
subgraph "NPZ Contents"
COORD[Coordinates<br/>X, Y, Z]
GEOM[Geometric Features<br/>Normals, Curvature]
SEMANTIC[Semantic Features<br/>Planarity, Verticality]
META[Metadata<br/>Intensity, Return#]
LABELS[Building Labels<br/>LOD2/LOD3 Classes]
end
LAZ --> ELAZ
ELAZ --> NPZ
NPZ --> COORD
NPZ --> GEOM
NPZ --> SEMANTIC
NPZ --> META
NPZ --> LABELS
style LAZ fill:#ffebee
style ELAZ fill:#e3f2fd
style NPZ fill:#e8f5e8
๐ข NPZ File Structure
Each patch is saved as an NPZ file containing:
{
'points': np.ndarray, # [N, 3] XYZ coordinates
'normals': np.ndarray, # [N, 3] surface normals
'curvature': np.ndarray, # [N] principal curvature
'intensity': np.ndarray, # [N] normalized intensity
'return_number': np.ndarray, # [N] return number
'height': np.ndarray, # [N] height above ground
'planarity': np.ndarray, # [N] planarity measure
'verticality': np.ndarray, # [N] verticality measure
'horizontality': np.ndarray, # [N] horizontality measure
'density': np.ndarray, # [N] local point density
'labels': np.ndarray, # [N] building class labels
}
๐ Data Dimensions
| Component | Shape | Data Type | Description |
|---|---|---|---|
points |
[N, 3] | float32 |
3D coordinates (X, Y, Z) |
normals |
[N, 3] | float32 |
Surface normal vectors |
features |
[N, 27] | float32 |
Geometric feature matrix |
labels |
[N] | uint8 |
Building component classes |
metadata |
[4] | object |
Patch info (bbox, tile_id) |
๐ฆ Typical patch: 16,384 points, ~2.5MB compressed, ~8MB in memory
๐ Batch Download
from ign_lidar import IGNLiDARDownloader
# Initialize downloader
downloader = IGNLiDARDownloader("downloads/")
# Download tiles by bounding box (WGS84)
tiles = downloader.download_by_bbox(
bbox=(-2.0, 47.0, -1.0, 48.0), # West France
max_tiles=10
)
# Download specific tiles
tile_names = ["LHD_FXX_0186_6834_PTS_C_LAMB93_IGN69"]
downloader.download_tiles(tile_names)
๐ Examples
Urban Processing
# High-detail urban processing
processor = LiDARProcessor(lod_level="LOD3", num_augmentations=5)
patches = processor.process_tile("urban_area.laz", "output/urban/")
Rural Processing
# Simplified rural processing
processor = LiDARProcessor(lod_level="LOD2", num_augmentations=2)
patches = processor.process_tile("rural_area.laz", "output/rural/")
Batch Processing
from ign_lidar import WORKING_TILES, get_tiles_by_environment
# Get coastal tiles
coastal_tiles = get_tiles_by_environment("coastal")
# Process all coastal areas
for tile_info in coastal_tiles:
patches = processor.process_tile(
f"data/{tile_info['tile_name']}.laz",
f"output/coastal/{tile_info['tile_name']}/"
)
๐ ๏ธ Development
Setup Development Environment
git clone https://github.com/your-username/ign-lidar-hd-downloader
cd ign-lidar-hd-downloader
pip install -e ".[dev]"
Run Tests
pytest tests/
Code Formatting
black ign_lidar/
flake8 ign_lidar/
๐ Documentation & Resources
๐ Complete Documentation Hub
For comprehensive documentation, see the Documentation Hub:
- ๐ User Guides - Quick start guides, QGIS integration, troubleshooting
- โก Features - Smart skip detection, format preferences, workflow optimization
- ๐ง Technical Reference - Memory optimization, performance tuning
- ๐ฆ Archive - Bug fixes history, release notes, migration guides
๐ Essential Quick Links
- ๐ฏ Quick Reference Card - Fast reference for all commands
- โก Smart Skip Features - Resume workflows efficiently
- ๐บ๏ธ QGIS Integration - GIS compatibility guide
- โ๏ธ Memory Optimization - Performance tuning
- ๐ Output Formats - LAZ 1.4 vs QGIS formats
๐ก Examples & Workflows
- Basic Usage - Simple processing examples
- Urban Processing - City-specific workflows
- Parallel Processing - Multi-worker optimization
- Full Workflow - End-to-end pipeline
- ๐ Pipeline Configuration - YAML-based workflows
- ๐ RGB Augmentation - Orthophoto integration
- PyTorch Integration - ML training setup
๐ Coming Soon: Interactive Documentation
We're working on a comprehensive Docusaurus documentation site that will include:
- ๐ Multi-language support (English & French)
- ๐ Full-text search
- ๐ฑ Mobile-responsive design
- ๐ Interactive tutorials
- ๐ Auto-generated API reference
- ๐ก Live code examples
See the Docusaurus Plan for details.
๐ API Reference
Core Classes
LiDARProcessor: Main processing engineIGNLiDARDownloader: Batch download functionalityLOD2_CLASSES,LOD3_CLASSES: Classification taxonomies
Utility Functions
compute_normals(): Surface normal computationcompute_curvature(): Principal curvature calculationextract_geometric_features(): Comprehensive feature extractionget_tiles_by_environment(): Filter tiles by environment type
๐ Requirements
- Python 3.8+
- NumPy >= 1.21.0
- laspy >= 2.3.0
- scikit-learn >= 1.0.0
- tqdm >= 4.60.0
- requests >= 2.25.0
- PyYAML >= 6.0 (for pipeline configuration)
- Pillow >= 9.0.0 (for RGB augmentation)
๐ License
MIT License - see LICENSE file for details.
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
๐ง Support
For issues and questions, please use the GitHub Issues page.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ign_lidar_hd-1.6.5.tar.gz.
File metadata
- Download URL: ign_lidar_hd-1.6.5.tar.gz
- Upload date:
- Size: 108.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6018ac22774d323f6dee04dfada3c29f736ed51a09f86ac1ce3d2958ddb146d
|
|
| MD5 |
50b571364a392e8120f01bdd6d69ef7e
|
|
| BLAKE2b-256 |
034009aaec280d4be58a8bb8fdd3c2101332cca701bffc49c7c37360f1d6c6f0
|
File details
Details for the file ign_lidar_hd-1.6.5-py3-none-any.whl.
File metadata
- Download URL: ign_lidar_hd-1.6.5-py3-none-any.whl
- Upload date:
- Size: 85.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a00eaa202b53dbd586cef4ca3ab595ee90c6795093cbb89bf5ed4d3dfe2e042
|
|
| MD5 |
e41b3bd8e48f6d649e62c03a57388ce8
|
|
| BLAKE2b-256 |
90f71ae4132a30edecd10f350120213fdf9d139cd1c69ea7dbd78c5bafa7769f
|