Skip to main content

IGN LiDAR HD Dataset Processing Library for Building LOD Classification

Project description

IGN LiDAR HD Processing Library

PyPI version PyPI - Downloads Python 3.8+ License: MIT Tests Documentation

Version 1.7.5 | ๐Ÿ“š Documentation

LoD3 Building Model Icon

A comprehensive Python library for processing IGN LiDAR HD data into machine learning-ready datasets for Building Level of Detail (LOD) classification tasks.

Quick Start โ€ข Features โ€ข Documentation โ€ข Examples โ€ข API Reference


๐Ÿ“‹ Table of Contents


๐Ÿ“Š Overview

This library transforms raw IGN (Institut National de l'Information Gรฉographique et Forestiรจre) LiDAR HD point clouds into structured datasets ready for machine learning applications. Built specifically for building classification tasks, it handles the complete pipeline from data acquisition to training-ready patches.

๐Ÿ“บ Video Demo

IGN LiDAR HD Processing Demo

โ–ถ๏ธ Click to watch: Learn how to process LiDAR data for machine learning applications

๐Ÿ”„ Processing Workflow

flowchart TD
    A[IGN LiDAR HD Data] --> B[Download Tiles]
    B --> C[Enrich with Features]
    C --> D[Create Training Patches]
    D --> E[ML-Ready Dataset]

    B --> B1[Smart Skip Detection]
    C --> C1[GPU/CPU Processing]
    C --> C2[Geometric Features]
    C --> C3[Data Augmentation]
    C --> C4[RGB + Infrared NIR]
    C --> C5[NDVI Calculation]
    D --> D1[LOD Classification]

    style A fill:#e1f5fe
    style E fill:#e8f5e8
    style B1 fill:#fff3e0
    style C1 fill:#fff3e0
    style C3 fill:#fff3e0
    style C4 fill:#c8e6c9
    style C5 fill:#c8e6c9

๐Ÿ“ˆ Project Stats

  • ๐Ÿ—๏ธ 14 core modules - Comprehensive processing toolkit
  • ๐Ÿ“ 10 example scripts - From basic usage to advanced workflows
  • ๐Ÿงช Comprehensive test suite - Ensuring reliability and performance
  • ๐ŸŒ 50+ curated tiles - Covering diverse French territories
  • โšก GPU & CPU support - Flexible computation backends
  • ๐Ÿ”„ Smart resumability - Never reprocess existing data

โœจ What's New

Version 1.7.5 - Performance Breakthrough ๐Ÿš€

Highlights:

  • ๐Ÿš€ 100-200x faster feature computation through vectorized operations
  • ๐Ÿ’ฏ 100% GPU utilization - Fixed efficiency bottlenecks
  • ๐Ÿ’พ 50-60% memory reduction - Per-chunk architecture for all modes
  • โฑ๏ธ Real-world impact: 18M points in ~64 seconds (GPU+cuML) vs 14+ minutes before
  • ๐Ÿง  Intelligent auto-scaling - Adaptive parameters based on hardware

Performance Modes:

  • ๐Ÿ–ฅ๏ธ CPU-only: 60 min/tile (baseline) - 1.8GB RAM
  • โšก Hybrid GPU (CuPy): 7-10 min/tile (6-8x speedup) - 2.8GB VRAM
  • ๐Ÿš€ Full GPU (RAPIDS cuML): 3-5 min/tile (12-20x speedup) - 3.4GB VRAM

๐Ÿ“– Full Release Notes | ๐Ÿ“‹ Changelog

Recent Features

  • v1.7.4: GPU acceleration with RAPIDS cuML, WSL2 support
  • v1.7.3: Infrared (NIR) augmentation for NDVI calculation
  • v1.7.1: Auto-parameter analysis for optimal processing
  • v1.7.0: Artifact mitigation preprocessing (60-80% reduction)
  • v1.6.0: Enhanced data augmentation during enrich phase

๐Ÿš€ Quick Start

Installation

Standard Installation (CPU Only)

pip install ign-lidar-hd
ign-lidar-hd --version  # Verify installation

GPU Acceleration (Optional - 6-20x Speedup)

For optimal performance, install with GPU support:

# Quick install using provided script
./install_cuml.sh

# Or manual installation
# Prerequisites: NVIDIA GPU (4GB+ VRAM), CUDA 12.0+, Miniconda/Anaconda
conda create -n ign_gpu python=3.12 -y
conda activate ign_gpu
conda install -c rapidsai -c conda-forge -c nvidia cuml=24.10 cupy cuda-version=12.5 -y
pip install ign-lidar-hd

# Verify GPU setup
python scripts/verify_gpu_setup.py

๐Ÿ“š Detailed Installation Guides:

Quick Example

# 1. Download sample data
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 5

# 2. Enrich with features (GPU accelerated if available)
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full --use-gpu

# 3. Create training patches
ign-lidar-hd patch --input-dir enriched/ --output patches/ --lod-level LOD2

Python API:

from ign_lidar import LiDARProcessor

# Initialize processor
processor = LiDARProcessor(lod_level="LOD2")

# Process a single tile
patches = processor.process_tile("data.laz", "output/")

# Process multiple files
patches = processor.process_directory("data/", "output/", num_workers=4)

๐Ÿ“‹ Key Features

๐Ÿ—๏ธ Core Processing

  • Pure LiDAR processing - Geometric analysis without RGB dependencies
  • RGB & Infrared augmentation - Optional color and Near-Infrared (NIR) from IGN orthophotos
  • NDVI-ready datasets - Automatic vegetation index calculation (RGB + NIR)
  • Multi-level classification - LOD2 (15 classes) and LOD3 (30+ classes) support
  • Rich features - Surface normals, curvature, planarity, verticality, local density
  • Architectural styles - Automatic building style inference
  • Preprocessing - Artifact mitigation (60-80% scan line reduction)
  • Auto-parameters - Intelligent tile analysis for optimal processing

โšก Performance & Optimization

  • GPU acceleration - CUDA-accelerated with RAPIDS cuML (6-20x speedup)
  • Parallel processing - Multi-worker support with CPU core detection
  • Memory optimization - Per-chunk architecture, 50-60% memory reduction
  • Smart skip detection - Resume interrupted workflows automatically
  • Batch operations - Process hundreds of tiles efficiently
  • Scalable - Tested up to 1B+ points

๐Ÿ”ง Workflow Automation

  • Pipeline configuration - YAML-based declarative workflows
  • Integrated downloader - IGN WFS tile discovery and batch downloading
  • Format flexibility - LAZ 1.4 (full features) or QGIS-compatible output
  • Unified CLI - Single ign-lidar-hd command with intuitive subcommands
  • Idempotent operations - Safe to restart, never reprocesses existing data

๐ŸŒ Geographic Intelligence

  • Strategic locations - Pre-configured urban, coastal, and rural areas
  • Bounding box filtering - Spatial subsetting for targeted analysis
  • Coordinate handling - Automatic Lambert93 โ†” WGS84 transformations
  • Tile management - Curated collection of 50+ test tiles across France

๐Ÿ“– Usage Guide

Command Line Interface

The library provides a unified ign-lidar-hd command with four main subcommands:

1. Download Command

Download LiDAR tiles from IGN:

# Download by bounding box
ign-lidar-hd download --bbox 2.3,48.8,2.4,48.9 --output data/ --max-tiles 10

# Download specific tiles
ign-lidar-hd download --tiles tile1.laz tile2.laz --output data/

2. Enrich Command

Enrich LAZ files with geometric features:

# Basic enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --mode full

# GPU-accelerated enrichment
ign-lidar-hd enrich --input-dir data/ --output enriched/ --use-gpu

# Full-featured enrichment (recommended)
ign-lidar-hd enrich \
  --input-dir data/ \
  --output enriched/ \
  --mode full \
  --use-gpu \
  --auto-params \
  --preprocess \
  --add-rgb --rgb-cache-dir cache/rgb \
  --add-infrared --infrared-cache-dir cache/infrared

# Custom preprocessing
ign-lidar-hd enrich \
  --input-dir data/ \
  --output enriched/ \
  --preprocess \
  --sor-k 15 --sor-std 2.5 \
  --ror-radius 1.0 --ror-neighbors 4 \
  --voxel-size 0.5

Preprocessing Options:

  • --preprocess - Enable artifact mitigation
  • --sor-k - Statistical outlier removal: number of neighbors (default: 12)
  • --sor-std - SOR: std deviation multiplier (default: 2.0)
  • --ror-radius - Radius outlier removal: search radius in meters (default: 1.0)
  • --ror-neighbors - ROR: minimum neighbors required (default: 4)
  • --voxel-size - Voxel downsampling size in meters (optional)

Augmentation Options:

  • --add-rgb - Add RGB colors from IGN orthophotos
  • --add-infrared - Add NIR values from IGN IRC orthophotos
  • --augment - Enable geometric augmentation (disabled by default)
  • --num-augmentations - Number of augmented versions (default: 3)

3. Patch Command

Create training patches from enriched files:

# Create patches
ign-lidar-hd patch \
  --input-dir enriched/ \
  --output patches/ \
  --lod-level LOD2 \
  --patch-size 150.0 \
  --num-workers 4

4. Pipeline Command (Recommended)

Execute complete workflows using YAML configuration:

# Create example configuration
ign-lidar-hd pipeline config.yaml --create-example full

# Run configured pipeline
ign-lidar-hd pipeline config.yaml

Example YAML Configuration:

global:
  num_workers: 4

download:
  bbox: "2.3, 48.8, 2.4, 48.9"
  output: "data/raw"
  max_tiles: 10

enrich:
  input_dir: "data/raw"
  output: "data/enriched"
  mode: "full"
  use_gpu: true
  auto_params: true
  preprocess: true
  add_rgb: true
  add_infrared: true
  rgb_cache_dir: "cache/rgb"
  infrared_cache_dir: "cache/infrared"

patch:
  input_dir: "data/enriched"
  output: "data/patches"
  lod_level: "LOD2"
  num_points: 16384

Python API

Basic Usage

from ign_lidar import LiDARProcessor

# Initialize processor
processor = LiDARProcessor(
    lod_level="LOD2",
    patch_size=150.0,
    patch_overlap=0.1
)

# Process single tile
patches = processor.process_tile("input.laz", "output/")

# Process directory
patches = processor.process_directory("input_dir/", "output_dir/", num_workers=4)

Batch Download

from ign_lidar import IGNLiDARDownloader

# Initialize downloader
downloader = IGNLiDARDownloader("downloads/")

# Download by bounding box (WGS84)
tiles = downloader.download_by_bbox(
    bbox=(-2.0, 47.0, -1.0, 48.0),
    max_tiles=10
)

# Download specific tiles
tile_names = ["LHD_FXX_0186_6834_PTS_C_LAMB93_IGN69"]
downloader.download_tiles(tile_names)

Configuration

# LOD Levels
processor = LiDARProcessor(lod_level="LOD2")  # 15 classes
processor = LiDARProcessor(lod_level="LOD3")  # 30+ classes

# Processing Options
processor = LiDARProcessor(
    lod_level="LOD2",
    patch_size=150.0,          # Patch size in meters
    patch_overlap=0.1,         # 10% overlap
    bbox=[xmin, ymin, xmax, ymax]  # Spatial filter
)

๐Ÿ—๏ธ Library Architecture

Component Architecture

graph TB
    subgraph "Core Processing"
        P[processor.py<br/>๐Ÿ”ง Main Engine]
        F[features.py<br/>โšก Feature Extraction]
        GPU[features_gpu.py<br/>๐Ÿ–ฅ๏ธ GPU Acceleration]
    end

    subgraph "Data Management"
        D[downloader.py<br/>๐Ÿ“ฅ IGN WFS Integration]
        TL[tile_list.py<br/>๐Ÿ“‚ Tile Management]
        SL[strategic_locations.py<br/>๐Ÿ—บ๏ธ Geographic Zones]
        MD[metadata.py<br/>๐Ÿ“Š Dataset Metadata]
    end

    subgraph "Classification & Styles"
        C[classes.py<br/>๐Ÿข LOD2/LOD3 Schemas]
        AS[architectural_styles.py<br/>๐ŸŽจ Style Inference]
    end

    subgraph "Integration & Config"
        CLI[cli.py<br/>๐Ÿ–ฑ๏ธ Command Interface]
        CFG[config.py<br/>โš™๏ธ Configuration]
        QGIS[qgis_converter.py<br/>๐Ÿ”„ QGIS Compatibility]
        U[utils.py<br/>๐Ÿ› ๏ธ Core Utilities]
    end

    CLI --> P
    CLI --> D
    P --> F
    P --> GPU
    P --> C
    F --> AS
    D --> TL
    D --> SL
    P --> MD

    style P fill:#e3f2fd
    style F fill:#e8f5e8
    style D fill:#fff3e0
    style CLI fill:#f3e5f5

Module Responsibilities

Module Purpose Key Features
๐Ÿ”ง processor.py Main processing engine Patch creation, LOD classification, workflow orchestration
๐Ÿ“ฅ downloader.py IGN WFS integration Tile discovery, batch download, smart skip detection
โšก features.py Feature extraction Normals, curvature, geometric properties
๐Ÿ–ฅ๏ธ features_gpu.py GPU acceleration CUDA-optimized feature computation
๐Ÿข classes.py Classification schemas LOD2/LOD3 building taxonomies
๐ŸŽจ architectural_styles.py Style inference Building architecture classification

Example Workflows

examples/
โ”œโ”€โ”€ ๐Ÿš€ basic_usage.py                      # Getting started
โ”œโ”€โ”€ ๐Ÿ™๏ธ example_urban_simple.py            # Urban processing
โ”œโ”€โ”€ โšก parallel_processing_example.py       # Performance optimization
โ”œโ”€โ”€ ๐Ÿ”„ full_workflow_example.py            # End-to-end pipeline
โ”œโ”€โ”€ ๐ŸŽจ multistyle_processing.py            # Architecture analysis
โ”œโ”€โ”€ ๐Ÿง  pytorch_dataloader.py               # ML integration
โ”œโ”€โ”€ ๐Ÿ†• pipeline_example.py                 # YAML pipeline usage
โ”œโ”€โ”€ ๐Ÿ†• enrich_with_rgb.py                  # RGB augmentation
โ”œโ”€โ”€ ๐Ÿ†• demo_infrared_augmentation.py       # Infrared augmentation
โ””โ”€โ”€ workflows/                             # Production pipelines

config_examples/
โ”œโ”€โ”€ ๐Ÿ†• pipeline_full.yaml                  # Complete workflow
โ”œโ”€โ”€ ๐Ÿ†• pipeline_enrich.yaml                # Enrich-only
โ””โ”€โ”€ ๐Ÿ†• pipeline_patch.yaml                 # Patch-only

๐Ÿ“ฆ Output Format

Data Structure

graph TB
    subgraph "Raw Input"
        LAZ[LAZ Point Cloud<br/>XYZ + Intensity<br/>Classification]
    end

    subgraph "Enriched Data"
        ELAZ[Enhanced LAZ<br/>+ 30 Features<br/>+ Building Labels]
    end

    subgraph "ML Dataset"
        NPZ[NPZ Patches<br/>16K points each<br/>Ready for Training]
    end

    subgraph "NPZ Contents"
        COORD[Coordinates<br/>X, Y, Z]
        GEOM[Geometric Features<br/>Normals, Curvature]
        SEMANTIC[Semantic Features<br/>Planarity, Verticality]
        META[Metadata<br/>Intensity, Return#]
        LABELS[Building Labels<br/>LOD2/LOD3 Classes]
    end

    LAZ --> ELAZ
    ELAZ --> NPZ
    NPZ --> COORD
    NPZ --> GEOM
    NPZ --> SEMANTIC
    NPZ --> META
    NPZ --> LABELS

    style LAZ fill:#ffebee
    style ELAZ fill:#e3f2fd
    style NPZ fill:#e8f5e8

NPZ File Structure

Each patch is saved as an NPZ file containing:

{
    'points': np.ndarray,          # [N, 3] XYZ coordinates
    'normals': np.ndarray,         # [N, 3] surface normals
    'curvature': np.ndarray,       # [N] principal curvature
    'intensity': np.ndarray,       # [N] normalized intensity
    'return_number': np.ndarray,   # [N] return number
    'height': np.ndarray,          # [N] height above ground
    'planarity': np.ndarray,       # [N] planarity measure
    'verticality': np.ndarray,     # [N] verticality measure
    'horizontality': np.ndarray,   # [N] horizontality measure
    'density': np.ndarray,         # [N] local point density
    'labels': np.ndarray,          # [N] building class labels
    # Optional (with augmentation):
    'red': np.ndarray,             # [N] RGB red channel
    'green': np.ndarray,           # [N] RGB green channel
    'blue': np.ndarray,            # [N] RGB blue channel
    'infrared': np.ndarray,        # [N] NIR values
}

Data Dimensions

Component Shape Data Type Description
points [N, 3] float32 3D coordinates (X, Y, Z)
normals [N, 3] float32 Surface normal vectors
features [N, 27+] float32 Geometric feature matrix
labels [N] uint8 Building component classes
metadata [4] object Patch info (bbox, tile_id)

๐Ÿ“ฆ Typical patch: 16,384 points, ~2.5MB compressed, ~8MB in memory


๐Ÿ“ Examples

Urban Processing

# High-detail urban processing
from ign_lidar import LiDARProcessor

processor = LiDARProcessor(lod_level="LOD3", num_augmentations=5)
patches = processor.process_tile("urban_area.laz", "output/urban/")

Rural Processing

# Simplified rural processing
processor = LiDARProcessor(lod_level="LOD2", num_augmentations=2)
patches = processor.process_tile("rural_area.laz", "output/rural/")

Batch Processing

from ign_lidar import WORKING_TILES, get_tiles_by_environment

# Get coastal tiles
coastal_tiles = get_tiles_by_environment("coastal")

# Process all coastal areas
for tile_info in coastal_tiles:
    patches = processor.process_tile(
        f"data/{tile_info['tile_name']}.laz",
        f"output/coastal/{tile_info['tile_name']}/"
    )

PyTorch Integration

from torch.utils.data import Dataset, DataLoader
import numpy as np
import glob

class LiDARPatchDataset(Dataset):
    def __init__(self, patch_dir):
        self.patch_files = glob.glob(f"{patch_dir}/**/*.npz", recursive=True)

    def __len__(self):
        return len(self.patch_files)

    def __getitem__(self, idx):
        data = np.load(self.patch_files[idx])
        points = data['points']
        features = np.concatenate([
            data['normals'],
            data['curvature'][:, None],
            data['intensity'][:, None]
        ], axis=1)
        labels = data['labels']
        return points, features, labels

# Create dataloader
dataset = LiDARPatchDataset("patches/")
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

๐Ÿ“š Documentation & Resources

Complete Documentation Hub

For comprehensive documentation, see the Documentation Hub:

Essential Quick Links

Examples & Workflows


๐Ÿ› ๏ธ Development

Setup Development Environment

git clone https://github.com/sducournau/IGN_LIDAR_HD_DATASET
cd IGN_LIDAR_HD_DATASET
pip install -e ".[dev]"

Run Tests

pytest tests/

Code Formatting

black ign_lidar/
flake8 ign_lidar/

๐Ÿ”— Requirements

  • Python 3.8+
  • NumPy >= 1.21.0
  • laspy >= 2.3.0
  • scikit-learn >= 1.0.0
  • tqdm >= 4.60.0
  • requests >= 2.25.0
  • PyYAML >= 6.0 (for pipeline configuration)
  • Pillow >= 9.0.0 (for RGB augmentation)

Optional (for GPU acceleration):

  • CUDA >= 12.0
  • CuPy >= 12.0.0
  • RAPIDS cuML >= 24.10 (recommended for best performance)

๐Ÿ“š API Reference

Core Classes

  • LiDARProcessor: Main processing engine for tile and directory processing
  • IGNLiDARDownloader: Batch download functionality from IGN WFS service
  • LOD2_CLASSES, LOD3_CLASSES: Classification taxonomies

Utility Functions

  • compute_normals(): Surface normal computation
  • compute_curvature(): Principal curvature calculation
  • extract_geometric_features(): Comprehensive feature extraction
  • get_tiles_by_environment(): Filter tiles by environment type

๐Ÿ“„ License & Support

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please use the GitHub Issues page.


Made with โค๏ธ for the LiDAR and Machine Learning communities

โฌ† Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ign_lidar_hd-1.7.5.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ign_lidar_hd-1.7.5-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file ign_lidar_hd-1.7.5.tar.gz.

File metadata

  • Download URL: ign_lidar_hd-1.7.5.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ign_lidar_hd-1.7.5.tar.gz
Algorithm Hash digest
SHA256 0da14cd64167d1600549944f2e2f712800be53fd829787e99939a5d16d48b290
MD5 6ff6c1732d8e6c8dc0b9c2d2e0e841ce
BLAKE2b-256 ca296c9370086c139bdf6260e2af76b9ccd1d63faa811038023fd12e88949fbb

See more details on using hashes here.

File details

Details for the file ign_lidar_hd-1.7.5-py3-none-any.whl.

File metadata

  • Download URL: ign_lidar_hd-1.7.5-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for ign_lidar_hd-1.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3cc90921e8a6f33016dab0378a048da3ef5ca25bc3c6f7bd636da7878495d10b
MD5 eebf5ace21034ea1756e4cf4b8a9836d
BLAKE2b-256 4081f8b5a11c0c47d3e33cac146d1ae66a3a0e3347c31068e1f1d1dd66c82dc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page