Raster mask vectorization with topology-preserving simplification
Project description
Vectorizer Pro
A Python CLI tool to vectorize raster mask files into polygon shapefiles with topology-preserving simplification.
Features
- Convert raster masks (int8/16/32 class IDs) to vector polygons
- Pure Python topology-preserving Visvalingam-Whyatt (TPVW) simplification
- No GEOS dependency for simplification — self-contained pure Python implementation
- Support for large images (30000x30000+)
- 4-connectivity polygonization
- Output formats: Shapefile (.shp), GeoPackage (.gpkg), GeoJSON (.geojson)
- Preserve class ID attributes
- CRS preservation from input raster
Installation
pip install -e .
Or install from source:
git clone https://github.com/CVEO/vectorizer-pro.git
cd vectorizer-pro
pip install -e .
Usage
Basic Usage
vectorizer-pro input.tif output.shp
With Options
# Specify nodata value to exclude
vectorizer-pro input.tif output.shp --nodata 0
# Remove small regions (merge regions smaller than 100 pixels)
vectorizer-pro input.tif output.shp --min-area 100
# Set simplification tolerance
vectorizer-pro input.tif output.shp --tolerance 0.1
# Output as GeoPackage
vectorizer-pro input.tif output.gpkg --format gpkg
# Simplify only internal edges (preserve boundary)
vectorizer-pro input.tif output.shp --no-simplify-boundary
Command Line Options
| Option | Description |
|---|---|
--nodata INT |
Nodata value to exclude from vectorization |
--min-area FLOAT |
Minimum polygon area threshold. Smaller polygons will be merged into their largest adjacent neighbor |
--tolerance FLOAT |
Simplification tolerance (default: half pixel size) |
--format, -f |
Output format: shp, gpkg, or geojson (default: shp) |
--simplify-boundary/--no-simplify-boundary |
Simplify exterior boundaries (default: yes) |
--detect-nodata |
Print nodata value and exit |
--list-classes |
List unique class IDs and exit |
Python Package Usage
from vectorizer_pro import vectorize, VectorizeResult
# Simple usage - writes to file
result = vectorize("input.tif", "output.shp", nodata=0)
# Remove small regions in Python API
result = vectorize("input.tif", "output.shp", nodata=0, min_area=100)
# Get geometries without writing
result = vectorize("input.tif", nodata=0, output_path=None)
polygons = result.polygons
class_ids = result.class_ids
crs = result.crs
Examples
Quick Start
# Check nodata value
vectorizer-pro sample/top_potsdam_2_13.tif --detect-nodata
# List class IDs
vectorizer-pro sample/top_potsdam_2_13.tif --list-classes
# Vectorize excluding class 0
vectorizer-pro sample/top_potsdam_2_13.tif output.shp --nodata 0
Advanced Usage
# High simplification for smoother polygons
vectorizer-pro input.tif output.shp --nodata 0 --tolerance 0.5
# Remove small regions before simplification
vectorizer-pro input.tif output.shp --nodata 0 --min-area 50 --tolerance 0.1
# Preserve exact boundary shape
vectorizer-pro input.tif output.shp --nodata 0 --no-simplify-boundary
# GeoPackage output with custom tolerance
vectorizer-pro input.tif output.gpkg --format gpkg --tolerance 0.05
Requirements
- Python >= 3.10
- rasterio
- shapely >= 2.1
- click
- fiona
- numpy
References
Projects
-
GDAL - Raster I/O and Polygonize algorithm
https://gdal.org/ -
Shapely - Python geometry operations
https://shapely.readthedocs.io/ -
GEOS - C/C++ Geometry engine (reference implementation for TPVW algorithm) https://libgeos.org/
-
JTS (Java Topology Suite) - JAVA Topology Processing https://github.com/locationtech/jts
Algorithms
-
GDAL Polygonize - Two-arm chain edge tracing algorithm for 4-connectivity raster vectorization
-
Visvalingam-Whyatt - Area-based vertex removal simplification that preserves topology in polygonal coverages
-
TPVW (Topology-Preserving Visvalingam-Whyatt) - Extension of VW algorithm that ensures shared edges between adjacent polygons are simplified identically, preventing gaps and overlaps
Sample Data
-
sample/top_potsdam_2_13.tif- Semantic labeling result generated by an AI model on the ISPRS Potsdam 2D Semantic Labeling Contest benchmark dataset. Used as a demonstration of vectorizing large raster masks. -
sample/small.tif- A smaller sample for quick testing.
The original Potsdam aerial imagery and ground truth are from the ISPRS benchmark: https://www.isprs.org/
Authors
Wuhan University CVEO Team (武汉大学CVEO课题组)
Website: https://www.whu-cveo.com/
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorizer_pro-0.2.0.tar.gz.
File metadata
- Download URL: vectorizer_pro-0.2.0.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b0eeca0a43b7b4d60c8ac11334855ea7bfee3ae6d2d72c7ca568347ac99242e
|
|
| MD5 |
fcfe6215be17db02bdcbf4e10284488d
|
|
| BLAKE2b-256 |
b1dd0b03c6d87306e43ace6306a5e081b6dfc7373818b60967d0387f7ed8f905
|
File details
Details for the file vectorizer_pro-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vectorizer_pro-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a37c359234b6c8b8cc1e1cb48df20925813560546d3868a7aeec6864185c403
|
|
| MD5 |
a9bc3fb327eb332f7d1812355fad5d53
|
|
| BLAKE2b-256 |
cd7c3cff6db95a35e8d086bb94815a92cf0874e63b3f74a4565201ab5680c8f3
|