IGN LiDAR HD Dataset Processing Library for Building LOD Classification
Project description
IGN LiDAR HD Processing Library
Version 1.7.5 | ๐ Documentation
A comprehensive Python library for processing IGN LiDAR HD data into machine learning-ready datasets for Building Level of Detail (LOD) classification tasks.
Quick Start โข Features โข Documentation โข Examples โข API Reference
๐ Table of Contents
- Overview
- What's New
- Quick Start
- Key Features
- Usage Guide
- Library Architecture
- Output Format
- Examples
- Documentation
- Development
- Requirements
- License & Support
๐ Overview
This library transforms raw IGN (Institut National de l'Information Gรฉographique et Forestiรจre) LiDAR HD point clouds into structured datasets ready for machine learning applications. Built specifically for building classification tasks, it handles the complete pipeline from data acquisition to training-ready patches.
๐บ Video Demo
๐ Processing Workflow
flowchart TD
A[IGN LiDAR HD Data] --> B[Download Tiles]
B --> C[Enrich with Features]
C --> D[Create Training Patches]
D --> E[ML-Ready Dataset]
B --> B1[Smart Skip Detection]
C --> C1[GPU/CPU Processing]
C --> C2[Geometric Features]
C --> C3[Data Augmentation]
C --> C4[RGB + Infrared NIR]
C --> C5[NDVI Calculation]
D --> D1[LOD Classification]
style A fill:#e1f5fe
style E fill:#e8f5e8
style B1 fill:#fff3e0
style C1 fill:#fff3e0
style C3 fill:#fff3e0
style C4 fill:#c8e6c9
style C5 fill:#c8e6c9
๐ Project Stats
- ๐๏ธ 14 core modules - Comprehensive processing toolkit
- ๐ 10 example scripts - From basic usage to advanced workflows
- ๐งช Comprehensive test suite - Ensuring reliability and performance
- ๐ 50+ curated tiles - Covering diverse French territories
- โก GPU & CPU support - Flexible computation backends
- ๐ Smart resumability - Never reprocess existing data
โจ What's New
Version 1.7.5 - Performance Breakthrough ๐
Highlights:
- ๐ 100-200x faster feature computation through vectorized operations
- ๐ฏ 100% GPU utilization - Fixed efficiency bottlenecks
- ๐พ 50-60% memory reduction - Per-chunk architecture for all modes
- โฑ๏ธ Real-world impact: 18M points in ~64 seconds (GPU+cuML) vs 14+ minutes before
- ๐ง Intelligent auto-scaling - Adaptive parameters based on hardware
Performance Modes:
- ๐ฅ๏ธ CPU-only: 60 min/tile (baseline) - 1.8GB RAM
- โก Hybrid GPU (CuPy): 7-10 min/tile (6-8x speedup) - 2.8GB VRAM
- ๐ Full GPU (RAPIDS cuML): 3-5 min/tile (12-20x speedup) - 3.4GB VRAM
๐ Full Release Notes | ๐ Changelog
Recent Features
- v1.7.4: GPU acceleration with RAPIDS cuML, WSL2 support
- v1.7.3: Infrared (NIR) augmentation for NDVI calculation
- v1.7.1: Auto-parameter analysis for optimal processing
- v1.7.0: Artifact mitigation preprocessing (60-80% reduction)
- v1.6.0: Enhanced data augmentation during enrich phase
๐ Quick Start
Installation
Standard Installation (CPU Only)
pip install ign-lidar-hd
ign-lidar-hd --version # Verify installation
GPU Acceleration (Optional - 6-20x Speedup)
For optimal performance, install with GPU support:
# Quick install using provided script
./install_cuml.sh
# Or manual installation
# Prerequisites: NVIDIA GPU (4GB+ VRAM), CUDA 12.0+, Miniconda/Anaconda
conda create -n ign_gpu python=3.12 -y
conda activate ign_gpu
conda install -c rapidsai -c conda-forge -c nvidia cuml=24.10 cupy cuda-version=12.5 -y
pip install ign-lidar-hd
# Verify GPU setup
python scripts/verify_gpu_setup.py
๐ Detailed Installation Guides:
Quick Example
# 1. Download sample data
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 5
# 2. Enrich with features (GPU accelerated if available)
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full --use-gpu
# 3. Create training patches
ign-lidar-hd patch --input-dir enriched/ --output patches/ --lod-level LOD2
Python API:
from ign_lidar import LiDARProcessor
# Initialize processor
processor = LiDARProcessor(lod_level="LOD2")
# Process a single tile
patches = processor.process_tile("data.laz", "output/")
# Process multiple files
patches = processor.process_directory("data/", "output/", num_workers=4)
๐ Key Features
๐๏ธ Core Processing
- Pure LiDAR processing - Geometric analysis without RGB dependencies
- RGB & Infrared augmentation - Optional color and Near-Infrared (NIR) from IGN orthophotos
- NDVI-ready datasets - Automatic vegetation index calculation (RGB + NIR)
- Multi-level classification - LOD2 (15 classes) and LOD3 (30+ classes) support
- Rich features - Surface normals, curvature, planarity, verticality, local density
- Architectural styles - Automatic building style inference
- Preprocessing - Artifact mitigation (60-80% scan line reduction)
- Auto-parameters - Intelligent tile analysis for optimal processing
โก Performance & Optimization
- GPU acceleration - CUDA-accelerated with RAPIDS cuML (6-20x speedup)
- Parallel processing - Multi-worker support with CPU core detection
- Memory optimization - Per-chunk architecture, 50-60% memory reduction
- Smart skip detection - Resume interrupted workflows automatically
- Batch operations - Process hundreds of tiles efficiently
- Scalable - Tested up to 1B+ points
๐ง Workflow Automation
- Pipeline configuration - YAML-based declarative workflows
- Integrated downloader - IGN WFS tile discovery and batch downloading
- Format flexibility - LAZ 1.4 (full features) or QGIS-compatible output
- Unified CLI - Single
ign-lidar-hdcommand with intuitive subcommands - Idempotent operations - Safe to restart, never reprocesses existing data
๐ Geographic Intelligence
- Strategic locations - Pre-configured urban, coastal, and rural areas
- Bounding box filtering - Spatial subsetting for targeted analysis
- Coordinate handling - Automatic Lambert93 โ WGS84 transformations
- Tile management - Curated collection of 50+ test tiles across France
๐ Usage Guide
Command Line Interface
The library provides a unified ign-lidar-hd command with four main subcommands:
1. Download Command
Download LiDAR tiles from IGN:
# Download by bounding box
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 10
# Download specific tiles
ign-lidar-hd download --tiles tile1.laz tile2.laz --output data/
2. Enrich Command
Enrich LAZ files with geometric features:
# Basic enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full
# GPU-accelerated enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --use-gpu
# Full-featured enrichment (recommended)
ign-lidar-hd enrich \
--input-dir data/ \
--output enriched/ \
--mode full \
--use-gpu \
--auto-params \
--preprocess \
--add-rgb --rgb-cache-dir cache/rgb \
--add-infrared --infrared-cache-dir cache/infrared
# Custom preprocessing
ign-lidar-hd enrich \
--input-dir data/ \
--output enriched/ \
--preprocess \
--sor-k 15 --sor-std 2.5 \
--ror-radius 1.0 --ror-neighbors 4 \
--voxel-size 0.5
Preprocessing Options:
--preprocess- Enable artifact mitigation--sor-k- Statistical outlier removal: number of neighbors (default: 12)--sor-std- SOR: std deviation multiplier (default: 2.0)--ror-radius- Radius outlier removal: search radius in meters (default: 1.0)--ror-neighbors- ROR: minimum neighbors required (default: 4)--voxel-size- Voxel downsampling size in meters (optional)
Augmentation Options:
--add-rgb- Add RGB colors from IGN orthophotos--add-infrared- Add NIR values from IGN IRC orthophotos--augment- Enable geometric augmentation (disabled by default)--num-augmentations- Number of augmented versions (default: 3)
3. Patch Command
Create training patches from enriched files:
# Create patches
ign-lidar-hd patch \
--input-dir enriched/ \
--output patches/ \
--lod-level LOD2 \
--patch-size 150.0 \
--num-workers 4
4. Pipeline Command (Recommended)
Execute complete workflows using YAML configuration:
# Create example configuration
ign-lidar-hd pipeline config.yaml --create-example full
# Run configured pipeline
ign-lidar-hd pipeline config.yaml
Example YAML Configuration:
global:
num_workers: 4
download:
bbox: "2.3, 48.8, 2.4, 48.9"
output: "data/raw"
max_tiles: 10
enrich:
input_dir: "data/raw"
output: "data/enriched"
mode: "full"
use_gpu: true
auto_params: true
preprocess: true
add_rgb: true
add_infrared: true
rgb_cache_dir: "cache/rgb"
infrared_cache_dir: "cache/infrared"
patch:
input_dir: "data/enriched"
output: "data/patches"
lod_level: "LOD2"
num_points: 16384
Python API
Basic Usage
from ign_lidar import LiDARProcessor
# Initialize processor
processor = LiDARProcessor(
lod_level="LOD2",
patch_size=150.0,
patch_overlap=0.1
)
# Process single tile
patches = processor.process_tile("input.laz", "output/")
# Process directory
patches = processor.process_directory("input_dir/", "output_dir/", num_workers=4)
Batch Download
from ign_lidar import IGNLiDARDownloader
# Initialize downloader
downloader = IGNLiDARDownloader("downloads/")
# Download by bounding box (WGS84)
tiles = downloader.download_by_bbox(
bbox=(-2.0, 47.0, -1.0, 48.0),
max_tiles=10
)
# Download specific tiles
tile_names = ["LHD_FXX_0186_6834_PTS_C_LAMB93_IGN69"]
downloader.download_tiles(tile_names)
Configuration
# LOD Levels
processor = LiDARProcessor(lod_level="LOD2") # 15 classes
processor = LiDARProcessor(lod_level="LOD3") # 30+ classes
# Processing Options
processor = LiDARProcessor(
lod_level="LOD2",
patch_size=150.0, # Patch size in meters
patch_overlap=0.1, # 10% overlap
bbox=[xmin, ymin, xmax, ymax] # Spatial filter
)
๐๏ธ Library Architecture
Component Architecture
graph TB
subgraph "Core Processing"
P[processor.py<br/>๐ง Main Engine]
F[features.py<br/>โก Feature Extraction]
GPU[features_gpu.py<br/>๐ฅ๏ธ GPU Acceleration]
end
subgraph "Data Management"
D[downloader.py<br/>๐ฅ IGN WFS Integration]
TL[tile_list.py<br/>๐ Tile Management]
SL[strategic_locations.py<br/>๐บ๏ธ Geographic Zones]
MD[metadata.py<br/>๐ Dataset Metadata]
end
subgraph "Classification & Styles"
C[classes.py<br/>๐ข LOD2/LOD3 Schemas]
AS[architectural_styles.py<br/>๐จ Style Inference]
end
subgraph "Integration & Config"
CLI[cli.py<br/>๐ฑ๏ธ Command Interface]
CFG[config.py<br/>โ๏ธ Configuration]
QGIS[qgis_converter.py<br/>๐ QGIS Compatibility]
U[utils.py<br/>๐ ๏ธ Core Utilities]
end
CLI --> P
CLI --> D
P --> F
P --> GPU
P --> C
F --> AS
D --> TL
D --> SL
P --> MD
style P fill:#e3f2fd
style F fill:#e8f5e8
style D fill:#fff3e0
style CLI fill:#f3e5f5
Module Responsibilities
| Module | Purpose | Key Features |
|---|---|---|
๐ง processor.py |
Main processing engine | Patch creation, LOD classification, workflow orchestration |
๐ฅ downloader.py |
IGN WFS integration | Tile discovery, batch download, smart skip detection |
โก features.py |
Feature extraction | Normals, curvature, geometric properties |
๐ฅ๏ธ features_gpu.py |
GPU acceleration | CUDA-optimized feature computation |
๐ข classes.py |
Classification schemas | LOD2/LOD3 building taxonomies |
๐จ architectural_styles.py |
Style inference | Building architecture classification |
Example Workflows
examples/
โโโ ๐ basic_usage.py # Getting started
โโโ ๐๏ธ example_urban_simple.py # Urban processing
โโโ โก parallel_processing_example.py # Performance optimization
โโโ ๐ full_workflow_example.py # End-to-end pipeline
โโโ ๐จ multistyle_processing.py # Architecture analysis
โโโ ๐ง pytorch_dataloader.py # ML integration
โโโ ๐ pipeline_example.py # YAML pipeline usage
โโโ ๐ enrich_with_rgb.py # RGB augmentation
โโโ ๐ demo_infrared_augmentation.py # Infrared augmentation
โโโ workflows/ # Production pipelines
config_examples/
โโโ ๐ pipeline_full.yaml # Complete workflow
โโโ ๐ pipeline_enrich.yaml # Enrich-only
โโโ ๐ pipeline_patch.yaml # Patch-only
๐ฆ Output Format
Data Structure
graph TB
subgraph "Raw Input"
LAZ[LAZ Point Cloud<br/>XYZ + Intensity<br/>Classification]
end
subgraph "Enriched Data"
ELAZ[Enhanced LAZ<br/>+ 30 Features<br/>+ Building Labels]
end
subgraph "ML Dataset"
NPZ[NPZ Patches<br/>16K points each<br/>Ready for Training]
end
subgraph "NPZ Contents"
COORD[Coordinates<br/>X, Y, Z]
GEOM[Geometric Features<br/>Normals, Curvature]
SEMANTIC[Semantic Features<br/>Planarity, Verticality]
META[Metadata<br/>Intensity, Return#]
LABELS[Building Labels<br/>LOD2/LOD3 Classes]
end
LAZ --> ELAZ
ELAZ --> NPZ
NPZ --> COORD
NPZ --> GEOM
NPZ --> SEMANTIC
NPZ --> META
NPZ --> LABELS
style LAZ fill:#ffebee
style ELAZ fill:#e3f2fd
style NPZ fill:#e8f5e8
NPZ File Structure
Each patch is saved as an NPZ file containing:
{
'points': np.ndarray, # [N, 3] XYZ coordinates
'normals': np.ndarray, # [N, 3] surface normals
'curvature': np.ndarray, # [N] principal curvature
'intensity': np.ndarray, # [N] normalized intensity
'return_number': np.ndarray, # [N] return number
'height': np.ndarray, # [N] height above ground
'planarity': np.ndarray, # [N] planarity measure
'verticality': np.ndarray, # [N] verticality measure
'horizontality': np.ndarray, # [N] horizontality measure
'density': np.ndarray, # [N] local point density
'labels': np.ndarray, # [N] building class labels
# Optional (with augmentation):
'red': np.ndarray, # [N] RGB red channel
'green': np.ndarray, # [N] RGB green channel
'blue': np.ndarray, # [N] RGB blue channel
'infrared': np.ndarray, # [N] NIR values
}
Data Dimensions
| Component | Shape | Data Type | Description |
|---|---|---|---|
points |
[N, 3] | float32 |
3D coordinates (X, Y, Z) |
normals |
[N, 3] | float32 |
Surface normal vectors |
features |
[N, 27+] | float32 |
Geometric feature matrix |
labels |
[N] | uint8 |
Building component classes |
metadata |
[4] | object |
Patch info (bbox, tile_id) |
๐ฆ Typical patch: 16,384 points, ~2.5MB compressed, ~8MB in memory
๐ Examples
Urban Processing
# High-detail urban processing
from ign_lidar import LiDARProcessor
processor = LiDARProcessor(lod_level="LOD3", num_augmentations=5)
patches = processor.process_tile("urban_area.laz", "output/urban/")
Rural Processing
# Simplified rural processing
processor = LiDARProcessor(lod_level="LOD2", num_augmentations=2)
patches = processor.process_tile("rural_area.laz", "output/rural/")
Batch Processing
from ign_lidar import WORKING_TILES, get_tiles_by_environment
# Get coastal tiles
coastal_tiles = get_tiles_by_environment("coastal")
# Process all coastal areas
for tile_info in coastal_tiles:
patches = processor.process_tile(
f"data/{tile_info['tile_name']}.laz",
f"output/coastal/{tile_info['tile_name']}/"
)
PyTorch Integration
from torch.utils.data import Dataset, DataLoader
import numpy as np
import glob
class LiDARPatchDataset(Dataset):
def __init__(self, patch_dir):
self.patch_files = glob.glob(f"{patch_dir}/**/*.npz", recursive=True)
def __len__(self):
return len(self.patch_files)
def __getitem__(self, idx):
data = np.load(self.patch_files[idx])
points = data['points']
features = np.concatenate([
data['normals'],
data['curvature'][:, None],
data['intensity'][:, None]
], axis=1)
labels = data['labels']
return points, features, labels
# Create dataloader
dataset = LiDARPatchDataset("patches/")
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
๐ Documentation & Resources
Complete Documentation Hub
For comprehensive documentation, see the Documentation Hub:
- ๐ User Guides - Quick start guides, QGIS integration, troubleshooting
- โก Features - Smart skip detection, format preferences, workflow optimization
- ๐ง Technical Reference - Memory optimization, performance tuning
- ๐ฆ Archive - Bug fixes history, release notes, migration guides
Essential Quick Links
- ๐ฏ Quick Reference Card - Fast reference for all commands
- โก Smart Skip Features - Resume workflows efficiently
- ๐บ๏ธ QGIS Integration - GIS compatibility guide
- โ๏ธ Memory Optimization - Performance tuning
- ๐ Output Formats - LAZ 1.4 vs QGIS formats
Examples & Workflows
- Basic Usage - Simple processing examples
- Urban Processing - City-specific workflows
- Parallel Processing - Multi-worker optimization
- Full Workflow - End-to-end pipeline
- Pipeline Configuration - YAML-based workflows
- RGB Augmentation - Orthophoto integration
- PyTorch Integration - ML training setup
๐ ๏ธ Development
Setup Development Environment
git clone https://github.com/sducournau/IGN_LIDAR_HD_DATASET
cd IGN_LIDAR_HD_DATASET
pip install -e ".[dev]"
Run Tests
pytest tests/
Code Formatting
black ign_lidar/
flake8 ign_lidar/
๐ Requirements
- Python 3.8+
- NumPy >= 1.21.0
- laspy >= 2.3.0
- scikit-learn >= 1.0.0
- tqdm >= 4.60.0
- requests >= 2.25.0
- PyYAML >= 6.0 (for pipeline configuration)
- Pillow >= 9.0.0 (for RGB augmentation)
Optional (for GPU acceleration):
- CUDA >= 12.0
- CuPy >= 12.0.0
- RAPIDS cuML >= 24.10 (recommended for best performance)
๐ API Reference
Core Classes
LiDARProcessor: Main processing engine for tile and directory processingIGNLiDARDownloader: Batch download functionality from IGN WFS serviceLOD2_CLASSES,LOD3_CLASSES: Classification taxonomies
Utility Functions
compute_normals(): Surface normal computationcompute_curvature(): Principal curvature calculationextract_geometric_features(): Comprehensive feature extractionget_tiles_by_environment(): Filter tiles by environment type
๐ License & Support
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and questions, please use the GitHub Issues page.
Made with โค๏ธ for the LiDAR and Machine Learning communities
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ign_lidar_hd-1.7.5.tar.gz.
File metadata
- Download URL: ign_lidar_hd-1.7.5.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da14cd64167d1600549944f2e2f712800be53fd829787e99939a5d16d48b290
|
|
| MD5 |
6ff6c1732d8e6c8dc0b9c2d2e0e841ce
|
|
| BLAKE2b-256 |
ca296c9370086c139bdf6260e2af76b9ccd1d63faa811038023fd12e88949fbb
|
File details
Details for the file ign_lidar_hd-1.7.5-py3-none-any.whl.
File metadata
- Download URL: ign_lidar_hd-1.7.5-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cc90921e8a6f33016dab0378a048da3ef5ca25bc3c6f7bd636da7878495d10b
|
|
| MD5 |
eebf5ace21034ea1756e4cf4b8a9836d
|
|
| BLAKE2b-256 |
4081f8b5a11c0c47d3e33cac146d1ae66a3a0e3347c31068e1f1d1dd66c82dc9
|