Skip to main content

Raster mask vectorization with topology-preserving simplification

Project description

Vectorizer Pro

English | 中文

A Python CLI tool to vectorize raster mask files into polygon shapefiles with topology-preserving simplification.

Features

  • Convert raster masks (int8/16/32 class IDs) to vector polygons
  • Pure Python topology-preserving Visvalingam-Whyatt (TPVW) simplification
  • No GEOS dependency for simplification — self-contained pure Python implementation
  • Support for large images (30000x30000+)
  • 4-connectivity polygonization
  • Output formats: Shapefile (.shp), GeoPackage (.gpkg), GeoJSON (.geojson)
  • Preserve class ID attributes
  • CRS preservation from input raster

Installation

pip install -e .

Or install from source:

git clone https://github.com/CVEO/vectorizer-pro.git
cd vectorizer-pro
pip install -e .

Usage

Basic Usage

vectorizer-pro input.tif output.shp

With Options

# Specify nodata value to exclude
vectorizer-pro input.tif output.shp --nodata 0

# Remove small regions (merge regions smaller than 100 pixels)
vectorizer-pro input.tif output.shp --min-area 100

# Set simplification tolerance
vectorizer-pro input.tif output.shp --tolerance 0.1

# Output as GeoPackage
vectorizer-pro input.tif output.gpkg --format gpkg

# Simplify only internal edges (preserve boundary)
vectorizer-pro input.tif output.shp --no-simplify-boundary

Command Line Options

Option Description
--nodata INT Nodata value to exclude from vectorization
--min-area FLOAT Minimum polygon area threshold. Smaller polygons will be merged into their largest adjacent neighbor
--tolerance FLOAT Simplification tolerance (default: half pixel size)
--format, -f Output format: shp, gpkg, or geojson (default: shp)
--simplify-boundary/--no-simplify-boundary Simplify exterior boundaries (default: yes)
--detect-nodata Print nodata value and exit
--list-classes List unique class IDs and exit

Python Package Usage

from vectorizer_pro import vectorize, VectorizeResult

# Simple usage - writes to file
result = vectorize("input.tif", "output.shp", nodata=0)

# Remove small regions in Python API
result = vectorize("input.tif", "output.shp", nodata=0, min_area=100)

# Get geometries without writing
result = vectorize("input.tif", nodata=0, output_path=None)
polygons = result.polygons
class_ids = result.class_ids
crs = result.crs

Examples

Quick Start

# Check nodata value
vectorizer-pro sample/top_potsdam_2_13.tif --detect-nodata

# List class IDs
vectorizer-pro sample/top_potsdam_2_13.tif --list-classes

# Vectorize excluding class 0
vectorizer-pro sample/top_potsdam_2_13.tif output.shp --nodata 0

Advanced Usage

# High simplification for smoother polygons
vectorizer-pro input.tif output.shp --nodata 0 --tolerance 0.5

# Remove small regions before simplification
vectorizer-pro input.tif output.shp --nodata 0 --min-area 50 --tolerance 0.1

# Preserve exact boundary shape
vectorizer-pro input.tif output.shp --nodata 0 --no-simplify-boundary

# GeoPackage output with custom tolerance
vectorizer-pro input.tif output.gpkg --format gpkg --tolerance 0.05

Requirements

  • Python >= 3.10
  • rasterio
  • shapely >= 2.1
  • click
  • fiona
  • numpy

References

Projects

Algorithms

  • GDAL Polygonize - Two-arm chain edge tracing algorithm for 4-connectivity raster vectorization

  • Visvalingam-Whyatt - Area-based vertex removal simplification that preserves topology in polygonal coverages

  • TPVW (Topology-Preserving Visvalingam-Whyatt) - Extension of VW algorithm that ensures shared edges between adjacent polygons are simplified identically, preventing gaps and overlaps

Sample Data

  • sample/top_potsdam_2_13.tif - Semantic labeling result generated by an AI model on the ISPRS Potsdam 2D Semantic Labeling Contest benchmark dataset. Used as a demonstration of vectorizing large raster masks.

  • sample/small.tif - A smaller sample for quick testing.

The original Potsdam aerial imagery and ground truth are from the ISPRS benchmark: https://www.isprs.org/

Authors

Wuhan University CVEO Team (武汉大学CVEO课题组)

Website: https://www.whu-cveo.com/

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorizer_pro-0.2.0.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorizer_pro-0.2.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file vectorizer_pro-0.2.0.tar.gz.

File metadata

  • Download URL: vectorizer_pro-0.2.0.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for vectorizer_pro-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0b0eeca0a43b7b4d60c8ac11334855ea7bfee3ae6d2d72c7ca568347ac99242e
MD5 fcfe6215be17db02bdcbf4e10284488d
BLAKE2b-256 b1dd0b03c6d87306e43ace6306a5e081b6dfc7373818b60967d0387f7ed8f905

See more details on using hashes here.

File details

Details for the file vectorizer_pro-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vectorizer_pro-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for vectorizer_pro-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a37c359234b6c8b8cc1e1cb48df20925813560546d3868a7aeec6864185c403
MD5 a9bc3fb327eb332f7d1812355fad5d53
BLAKE2b-256 cd7c3cff6db95a35e8d086bb94815a92cf0874e63b3f74a4565201ab5680c8f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page