Skip to main content

IGN LiDAR HD Dataset Processing Library for Building LOD Classification

Project description

IGN LiDAR HD Processing Library

PyPI version PyPI - Downloads Python 3.8+ License: MIT Tests Documentation

Version 2.1.2 | ๐Ÿ“š Documentation

LoD3 Building Model Icon

A comprehensive Python library for processing IGN LiDAR HD data into machine learning-ready datasets for Building Level of Detail (LOD) classification tasks.

Quick Start โ€ข Features โ€ข Documentation โ€ข Examples โ€ข API Reference

๐Ÿ“‹ Table of Contents


๐Ÿ“Š Overview

This library transforms raw IGN (Institut National de l'Information Gรฉographique et Forestiรจre) LiDAR HD point clouds into structured datasets ready for machine learning applications. Built specifically for building classification tasks, it handles the complete pipeline from data acquisition to training-ready patches.

๐Ÿ“บ Video Demo

IGN LiDAR HD Processing Demo

โ–ถ๏ธ Click to watch: Learn how to process LiDAR data for machine learning applications

๐Ÿ”„ Processing Workflow

flowchart TD
    A[IGN LiDAR HD Data] --> B[Download Tiles]
    B --> C[Enrich with Features]
    C --> D[Create Training Patches]
    D --> E[ML-Ready Dataset]

    B --> B1[Smart Skip Detection]
    C --> C1[GPU/CPU Processing]
    C --> C2[Geometric Features]
    C --> C3[Data Augmentation]
    C --> C4[RGB + Infrared NIR]
    C --> C5[NDVI Calculation]
    D --> D1[LOD Classification]

    style A fill:#e1f5fe
    style E fill:#e8f5e8
    style B1 fill:#fff3e0
    style C1 fill:#fff3e0
    style C3 fill:#fff3e0
    style C4 fill:#c8e6c9
    style C5 fill:#c8e6c9

๐Ÿ“ˆ Project Stats

  • ๐Ÿ—๏ธ 14 core modules - Comprehensive processing toolkit
  • ๐Ÿ“ 10 example scripts - From basic usage to complex workflows
  • ๐Ÿงช Comprehensive test suite - Ensuring reliability and performance
  • ๐ŸŒ 50+ curated tiles - Covering diverse French territories
  • โšก GPU & CPU support - Flexible computation backends
  • ๐Ÿ”„ Smart resumability - Never reprocess existing data

โœจ What's New (v2.1.1)

  • Bug Fixes: Fixed planarity feature computation formula and preprocessing stitching for boundary features
  • Improved Validation: Enhanced feature validation and artifact detection at tile boundaries
  • Code Quality: Repository cleanup and better code organization
  • Documentation: Updated documentation and improved examples

Previous Release (v2.1.0):

  • Feature Validation: Automatic detection of geometric feature artifacts at tile boundaries
  • French Documentation: Complete French i18n structure (73 files)
  • Hybrid Model Support: Optimized LOD3 hybrid model training configurations
  • Enhanced Documentation: Training commands reference and workflow guides

See CHANGELOG.md for full details and previous releases.


๐Ÿš€ Quick Start

Installation

Standard Installation (CPU Only)

pip install ign-lidar-hd
ign-lidar-hd --version  # Verify installation

GPU Acceleration (Optional - 6-20x Speedup)

For optimal performance, install with GPU support:

# Quick install using provided script
./install_cuml.sh

# Or manual installation
# Prerequisites: NVIDIA GPU (4GB+ VRAM), CUDA 12.0+, Miniconda/Anaconda
conda create -n ign_gpu python=3.12 -y
conda activate ign_gpu
conda install -c rapidsai -c conda-forge -c nvidia cuml=24.10 cupy cuda-version=12.5 -y
pip install ign-lidar-hd

# Verify GPU setup
python scripts/verify_gpu_setup.py

๐Ÿ“š Detailed Installation Guides:

Quick Example

# 1. Download sample data
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 5

# 2. Enrich with features (GPU accelerated if available)
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full --use-gpu

# 3. Create training patches
ign-lidar-hd patch --input-dir enriched/ --output patches/ --lod-level LOD2

Python API:

from ign_lidar import LiDARProcessor

# Initialize processor
processor = LiDARProcessor(lod_level="LOD2")

# Process a single tile
patches = processor.process_tile("data.laz", "output/")

# Process multiple files
patches = processor.process_directory("data/", "output/", num_workers=4)

๐Ÿ“‹ Key Features

๐Ÿ—๏ธ Core Processing

  • Pure LiDAR processing - Geometric analysis without RGB dependencies
  • RGB & Infrared augmentation - Optional color and Near-Infrared (NIR) from IGN orthophotos
  • NDVI-ready datasets - Automatic vegetation index calculation (RGB + NIR)
  • Multi-level classification - LOD2 (15 classes) and LOD3 (30+ classes) support
  • Rich features - Surface normals, curvature, planarity, verticality, local density
  • Architectural styles - Automatic building style inference
  • Preprocessing - Artifact mitigation (60-80% scan line reduction)
  • Auto-parameters - Intelligent tile analysis for optimal processing

โšก Performance & Optimization

  • GPU acceleration - CUDA-accelerated with RAPIDS cuML (6-20x speedup)
  • Parallel processing - Multi-worker support with CPU core detection
  • Memory optimization - Per-chunk architecture, 50-60% memory reduction
  • Smart skip detection - Resume interrupted workflows automatically
  • Batch operations - Process hundreds of tiles efficiently
  • Scalable - Tested up to 1B+ points

๐Ÿ”ง Workflow Automation

  • Pipeline configuration - YAML-based declarative workflows
  • Integrated downloader - IGN WFS tile discovery and batch downloading
  • Format flexibility - LAZ 1.4 (full features) or QGIS-compatible output
  • CLI - Single ign-lidar-hd command with intuitive subcommands
  • Idempotent operations - Safe to restart, never reprocesses existing data

๐ŸŒ Geographic Intelligence

  • Strategic locations - Pre-configured urban, coastal, and rural areas
  • Bounding box filtering - Spatial subsetting for targeted analysis
  • Coordinate handling - Automatic Lambert93 โ†” WGS84 transformations
  • Tile management - Curated collection of 50+ test tiles across France

๐Ÿ“– Usage Guide

Command Line Interface

The library provides a ign-lidar-hd command with four main subcommands:

1. Download Command

Download LiDAR tiles from IGN:

# Download by bounding box
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 10

# Download specific tiles
ign-lidar-hd download --tiles tile1.laz tile2.laz --output data/

2. Enrich Command

Enrich LAZ files with geometric features:

# Basic enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full

# GPU-accelerated enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --use-gpu

# Full-featured enrichment (recommended)
ign-lidar-hd enrich \
  --input-dir data/ \
  --output enriched/ \
  --mode full \
  --use-gpu \
  --auto-params \
  --preprocess \
  --add-rgb --rgb-cache-dir cache/rgb \
  --add-infrared --infrared-cache-dir cache/infrared

# Custom preprocessing
ign-lidar-hd enrich \
  --input-dir data/ \
  --output enriched/ \
  --preprocess \
  --sor-k 15 --sor-std 2.5 \
  --ror-radius 1.0 --ror-neighbors 4 \
  --voxel-size 0.5

Preprocessing Options:

  • --preprocess - Enable artifact mitigation
  • --sor-k - Statistical outlier removal: number of neighbors (default: 12)
  • --sor-std - SOR: std deviation multiplier (default: 2.0)
  • --ror-radius - Radius outlier removal: search radius in meters (default: 1.0)
  • --ror-neighbors - ROR: minimum neighbors required (default: 4)
  • --voxel-size - Voxel downsampling size in meters (optional)

Augmentation Options:

  • --add-rgb - Add RGB colors from IGN orthophotos
  • --add-infrared - Add NIR values from IGN IRC orthophotos
  • --augment - Enable geometric augmentation (disabled by default)
  • --num-augmentations - Number of augmented versions (default: 3)

3. Patch Command

Create training patches from enriched files:

# Create patches
ign-lidar-hd patch \
  --input-dir enriched/ \
  --output patches/ \
  --lod-level LOD2 \
  --patch-size 150.0 \
  --num-workers 4

4. Verify Command

Verify features in enriched LAZ files:

# Verify a single file
ign-lidar-hd verify --input enriched/file.laz

# Verify all files in a directory
ign-lidar-hd verify --input-dir enriched/

# Quick check with sample display
ign-lidar-hd verify --input enriched/file.laz --show-samples

# Batch verification (first 10 files)
ign-lidar-hd verify --input-dir enriched/ --max-files 10

# Quiet mode (summary only)
ign-lidar-hd verify --input-dir enriched/ --quiet

Verification Features:

  • โœ… RGB values (presence, ranges, diversity)
  • โœ… NIR/infrared values
  • โœ… Geometric features (linearity, planarity, sphericity, anisotropy, roughness)
  • โœ… Value range validation [0, 1]
  • โœ… Anomaly detection (default values, out-of-range)
  • โœ… Statistical distributions and sample point display

5. Pipeline Command (Recommended)

Execute complete workflows using YAML configuration:

# Create example configuration
ign-lidar-hd pipeline config.yaml --create-example full

# Run configured pipeline
ign-lidar-hd pipeline config.yaml

Example YAML Configuration:

global:
  num_workers: 4

download:
  bbox: "2.3, 48.8, 2.4, 48.9"
  output: "data/raw"
  max_tiles: 10

enrich:
  input_dir: "data/raw"
  output: "data/enriched"
  mode: "full"
  use_gpu: true
  auto_params: true
  preprocess: true
  add_rgb: true
  add_infrared: true
  rgb_cache_dir: "cache/rgb"
  infrared_cache_dir: "cache/infrared"

patch:
  input_dir: "data/enriched"
  output: "data/patches"
  lod_level: "LOD2"
  num_points: 16384

Python API

Basic Usage

from ign_lidar import LiDARProcessor

# Initialize processor
processor = LiDARProcessor(
    lod_level="LOD2",
    patch_size=150.0,
    patch_overlap=0.1
)

# Process single tile
patches = processor.process_tile("input.laz", "output/")

# Process directory
patches = processor.process_directory("input_dir/", "output_dir/", num_workers=4)

Batch Download

from ign_lidar import IGNLiDARDownloader

# Initialize downloader
downloader = IGNLiDARDownloader("downloads/")

# Download by bounding box (WGS84)
tiles = downloader.download_by_bbox(
    bbox=(-2.0, 47.0, -1.0, 48.0),
    max_tiles=10
)

# Download specific tiles
tile_names = ["LHD_FXX_0186_6834_PTS_C_LAMB93_IGN69"]
downloader.download_tiles(tile_names)

Configuration

# LOD Levels
processor = LiDARProcessor(lod_level="LOD2")  # 15 classes
processor = LiDARProcessor(lod_level="LOD3")  # 30+ classes

# Processing Options
processor = LiDARProcessor(
    lod_level="LOD2",
    patch_size=150.0,          # Patch size in meters
    patch_overlap=0.1,         # 10% overlap
    bbox=[xmin, ymin, xmax, ymax]  # Spatial filter
)

๐Ÿ—๏ธ Library Architecture

Component Architecture

graph TB
    subgraph "Core Processing"
        P[processor.py<br/>๐Ÿ”ง Main Engine]
        F[features.py<br/>โšก Feature Extraction]
        GPU[features_gpu.py<br/>๐Ÿ–ฅ๏ธ GPU Acceleration]
    end

    subgraph "Data Management"
        D[downloader.py<br/>๐Ÿ“ฅ IGN WFS Integration]
        TL[tile_list.py<br/>๐Ÿ“‚ Tile Management]
        SL[strategic_locations.py<br/>๐Ÿ—บ๏ธ Geographic Zones]
        MD[metadata.py<br/>๐Ÿ“Š Dataset Metadata]
    end

    subgraph "Classification & Styles"
        C[classes.py<br/>๐Ÿข LOD2/LOD3 Schemas]
        AS[architectural_styles.py<br/>๐ŸŽจ Style Inference]
    end

    subgraph "Integration & Config"
        CLI[cli.py<br/>๐Ÿ–ฑ๏ธ Command Interface]
        CFG[config.py<br/>โš™๏ธ Configuration]
        QGIS[qgis_converter.py<br/>๐Ÿ”„ QGIS Compatibility]
        U[utils.py<br/>๐Ÿ› ๏ธ Core Utilities]
    end

    CLI --> P
    CLI --> D
    P --> F
    P --> GPU
    P --> C
    F --> AS
    D --> TL
    D --> SL
    P --> MD

    style P fill:#e3f2fd
    style F fill:#e8f5e8
    style D fill:#fff3e0
    style CLI fill:#f3e5f5

Module Responsibilities

Module Purpose Key Features
๐Ÿ”ง processor.py Main processing engine Patch creation, LOD classification, workflow orchestration
๐Ÿ“ฅ downloader.py IGN WFS integration Tile discovery, batch download, smart skip detection
โšก features.py Feature extraction Normals, curvature, geometric properties
๐Ÿ–ฅ๏ธ features_gpu.py GPU acceleration CUDA-optimized feature computation
๐Ÿข classes.py Classification schemas LOD2/LOD3 building taxonomies
๐ŸŽจ architectural_styles.py Style inference Building architecture classification

Example Workflows

examples/
โ”œโ”€โ”€ ๐Ÿš€ basic_usage.py                      # Getting started
โ”œโ”€โ”€ ๐Ÿ™๏ธ example_urban_simple.py            # Urban processing
โ”œโ”€โ”€ โšก parallel_processing_example.py       # Performance optimization
โ”œโ”€โ”€ ๐Ÿ”„ full_workflow_example.py            # End-to-end pipeline
โ”œโ”€โ”€ ๐ŸŽจ multistyle_processing.py            # Architecture analysis
โ”œโ”€โ”€ ๐Ÿง  pytorch_dataloader.py               # ML integration
โ”œโ”€โ”€ ๐Ÿ†• pipeline_example.py                 # YAML pipeline usage
โ”œโ”€โ”€ ๐Ÿ†• enrich_with_rgb.py                  # RGB augmentation
โ”œโ”€โ”€ ๐Ÿ†• demo_infrared_augmentation.py       # Infrared augmentation
โ””โ”€โ”€ workflows/                             # Production pipelines

config_examples/
โ”œโ”€โ”€ ๐Ÿ†• pipeline_full.yaml                  # Complete workflow
โ”œโ”€โ”€ ๐Ÿ†• pipeline_enrich.yaml                # Enrich-only
โ””โ”€โ”€ ๐Ÿ†• pipeline_patch.yaml                 # Patch-only

๐Ÿ“ฆ Output Format

Data Structure

graph TB
    subgraph "Raw Input"
        LAZ[LAZ Point Cloud<br/>XYZ + Intensity<br/>Classification]
    end

    subgraph "Enriched Data"
        ELAZ[Enriched LAZ<br/>+ 30 Features<br/>+ Building Labels]
    end

    subgraph "ML Dataset"
        NPZ[NPZ Patches<br/>16K points each<br/>Ready for Training]
    end

    subgraph "NPZ Contents"
        COORD[Coordinates<br/>X, Y, Z]
        GEOM[Geometric Features<br/>Normals, Curvature]
        SEMANTIC[Semantic Features<br/>Planarity, Verticality]
        META[Metadata<br/>Intensity, Return#]
        LABELS[Building Labels<br/>LOD2/LOD3 Classes]
    end

    LAZ --> ELAZ
    ELAZ --> NPZ
    NPZ --> COORD
    NPZ --> GEOM
    NPZ --> SEMANTIC
    NPZ --> META
    NPZ --> LABELS

    style LAZ fill:#ffebee
    style ELAZ fill:#e3f2fd
    style NPZ fill:#e8f5e8

NPZ File Structure

Each patch is saved as an NPZ file containing:

{
    'points': np.ndarray,          # [N, 3] XYZ coordinates
    'normals': np.ndarray,         # [N, 3] surface normals
    'curvature': np.ndarray,       # [N] principal curvature
    'intensity': np.ndarray,       # [N] normalized intensity
    'return_number': np.ndarray,   # [N] return number
    'height': np.ndarray,          # [N] height above ground
    'planarity': np.ndarray,       # [N] planarity measure
    'verticality': np.ndarray,     # [N] verticality measure
    'horizontality': np.ndarray,   # [N] horizontality measure
    'density': np.ndarray,         # [N] local point density
    'labels': np.ndarray,          # [N] building class labels
    # Optional (with augmentation):
    'red': np.ndarray,             # [N] RGB red channel
    'green': np.ndarray,           # [N] RGB green channel
    'blue': np.ndarray,            # [N] RGB blue channel
    'infrared': np.ndarray,        # [N] NIR values
}

Data Dimensions

Component Shape Data Type Description
points [N, 3] float32 3D coordinates (X, Y, Z)
normals [N, 3] float32 Surface normal vectors
features [N, 27+] float32 Geometric feature matrix
labels [N] uint8 Building component classes
metadata [4] object Patch info (bbox, tile_id)

๐Ÿ“ฆ Typical patch: 16,384 points, ~2.5MB compressed, ~8MB in memory


๐Ÿ“ Examples

Urban Processing

# High-detail urban processing
from ign_lidar import LiDARProcessor

processor = LiDARProcessor(lod_level="LOD3", num_augmentations=5)
patches = processor.process_tile("urban_area.laz", "output/urban/")

Rural Processing

# Simplified rural processing
processor = LiDARProcessor(lod_level="LOD2", num_augmentations=2)
patches = processor.process_tile("rural_area.laz", "output/rural/")

Batch Processing

from ign_lidar import WORKING_TILES, get_tiles_by_environment

# Get coastal tiles
coastal_tiles = get_tiles_by_environment("coastal")

# Process all coastal areas
for tile_info in coastal_tiles:
    patches = processor.process_tile(
        f"data/{tile_info['tile_name']}.laz",
        f"output/coastal/{tile_info['tile_name']}/"
    )

PyTorch Integration

from torch.utils.data import Dataset, DataLoader
import numpy as np
import glob

class LiDARPatchDataset(Dataset):
    def __init__(self, patch_dir):
        self.patch_files = glob.glob(f"{patch_dir}/**/*.npz", recursive=True)

    def __len__(self):
        return len(self.patch_files)

    def __getitem__(self, idx):
        data = np.load(self.patch_files[idx])
        points = data['points']
        features = np.concatenate([
            data['normals'],
            data['curvature'][:, None],
            data['intensity'][:, None]
        ], axis=1)
        labels = data['labels']
        return points, features, labels

# Create dataloader
dataset = LiDARPatchDataset("patches/")
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

๐Ÿ“š Documentation & Resources

Complete Documentation Hub

For comprehensive documentation, see the Documentation Hub:

Essential Quick Links

Examples & Workflows


๐Ÿ› ๏ธ Development

Setup Development Environment

git clone https://github.com/sducournau/IGN_LIDAR_HD_DATASET
cd IGN_LIDAR_HD_DATASET
pip install -e ".[dev]"

Run Tests

pytest tests/

Code Formatting

black ign_lidar/
flake8 ign_lidar/

๐Ÿ”— Requirements

  • Python 3.8+
  • NumPy >= 1.21.0
  • laspy >= 2.3.0
  • scikit-learn >= 1.0.0
  • tqdm >= 4.60.0
  • requests >= 2.25.0
  • PyYAML >= 6.0 (for pipeline configuration)
  • Pillow >= 9.0.0 (for RGB augmentation)

Optional (for GPU acceleration):

  • CUDA >= 12.0
  • CuPy >= 12.0.0
  • RAPIDS cuML >= 24.10 (recommended for best performance)

๐Ÿ“š API Reference

Core Classes

  • LiDARProcessor: Main processing engine for tile and directory processing
  • IGNLiDARDownloader: Batch download functionality from IGN WFS service
  • LOD2_CLASSES, LOD3_CLASSES: Classification taxonomies

Utility Functions

  • compute_normals(): Surface normal computation
  • compute_curvature(): Principal curvature calculation
  • extract_geometric_features(): Comprehensive feature extraction
  • get_tiles_by_environment(): Filter tiles by environment type

๐Ÿ“„ License & Support

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please use the GitHub Issues page.


Made with โค๏ธ for the LiDAR and Machine Learning communities

โฌ† Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ign_lidar_hd-2.1.2.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ign_lidar_hd-2.1.2-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file ign_lidar_hd-2.1.2.tar.gz.

File metadata

  • Download URL: ign_lidar_hd-2.1.2.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ign_lidar_hd-2.1.2.tar.gz
Algorithm Hash digest
SHA256 24c16bce4855126e14a76eaf7375ff46eaacaf389cac1ccbaa8c32be7c5d1f62
MD5 c022100786ced2975143e17715f2bb6b
BLAKE2b-256 19a7ad85c9e51b15923a7beeae1eebd3217c845149dc2d7a72676f292a430d01

See more details on using hashes here.

File details

Details for the file ign_lidar_hd-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: ign_lidar_hd-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ign_lidar_hd-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ad207ed40f5b5594ff73a660e9297d5a8cb335c32a0699c87b11890cc7e757aa
MD5 6466e810375c6d08c4600d201fd49a85
BLAKE2b-256 8cdf785ff6b328baefa67b19257bcb368dd05a796d76b08cbc82463b2434beab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page