GeoQA: A Python package for geospatial data quality assessment and interactive profiling
Project description
GeoQA
Geospatial Data Quality Assessment & Interactive Profiling
Profile any geodataset with a single line of code
๐ What is GeoQA?
GeoQA is a Python package for automated quality assessment and interactive profiling of geospatial vector data. Think of it as ydata-profiling (formerly pandas-profiling) but purpose-built for geodata.
GeoQA lets you:
- Profile any vector dataset (Shapefile, GeoJSON, GeoPackage, etc.) with one line of code
- Validate geometry quality (invalid, empty, duplicate, mixed types)
- Analyze attribute completeness, statistics, and distributions
- Visualize data on interactive maps with quality issue highlighting
- Generate self-contained HTML quality reports
- Automate QA/QC workflows via CLI or Python API
โจ Key Features
| Feature | Description |
|---|---|
| ๐ One-liner Profiling | geoqa.profile("data.shp") โ instant dataset overview |
| โ Geometry Validation | OGC-compliant validity checks, empty/null detection, duplicate finding |
| ๐ Attribute Profiling | Data types, null analysis, unique values, descriptive statistics |
| ๐บ๏ธ Interactive Maps | Folium-based maps with issue highlighting and quality coloring |
| ๐ HTML Reports | Beautiful, self-contained quality reports with charts and tables |
| โก CLI Interface | geoqa profile data.shp โ terminal access to all features |
| ๐ง Auto-fix | Repair invalid geometries with profile.geometry_results |
| ๐ Spatial Analysis | CRS info, extent, area/length statistics, centroid computation |
๐ฆ Installation
pip
pip install geoqa
From source (development)
git clone https://github.com/geoqa/geoqa.git
cd geoqa
pip install -e ".[dev]"
Dependencies
GeoQA requires Python 3.9+ and depends on:
- geopandas โ Geospatial data manipulation
- shapely โ Geometry operations and validation
- folium โ Interactive map visualization
- matplotlib โ Static charts
- pandas / numpy โ Data analysis
- jinja2 โ Report template rendering
- click โ CLI framework
- rich โ Terminal formatting
๐ Quick Start
Python API
import geoqa
# Profile a dataset with one line
profile = geoqa.profile("buildings.shp")
# View summary
profile.summary()
# Output:
# โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
# โ GeoQA Profile: buildings โ
# โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
# โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
# โ Property โ Value โ
# โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโค
# โ Features โ 12,456 โ
# โ Columns โ 8 โ
# โ Geometry Type โ Polygon โ
# โ CRS โ EPSG:4326 โ
# โ Quality Score โ 94.2/100 โ
# โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
# Interactive map with issue highlighting
profile.show_map()
# Quality check details
checks = profile.quality_checks()
print(checks)
# Generate HTML report
profile.to_html("quality_report.html")
# Attribute statistics
profile.attribute_stats()
# Geometry measurements
profile.geometry_stats()
From a GeoDataFrame
import geopandas as gpd
import geoqa
gdf = gpd.read_file("roads.geojson")
profile = geoqa.profile(gdf, name="City Roads")
profile.summary()
CLI
# Profile a dataset
geoqa profile data.shp
# Generate HTML report
geoqa report data.shp --output report.html
# Run quality checks only
geoqa check data.geojson
# Show interactive map
geoqa show data.gpkg --output map.html
๐ Quality Score
GeoQA computes an overall quality score (0-100) based on:
| Component | Weight | Description |
|---|---|---|
| Geometry Validity | 40% | Percentage of valid geometries (OGC compliance) |
| Attribute Completeness | 30% | Percentage of non-null attribute values |
| CRS Defined | 15% | Whether a coordinate reference system is set |
| No Empty Geometries | 15% | Percentage of non-empty geometries |
๐บ๏ธ Interactive Visualization
GeoQA creates interactive folium maps with:
- Auto-reprojection to WGS84 for web display
- Quality highlighting โ invalid geometries in red, valid in blue
- Interactive tooltips with attribute data
- Multiple basemaps โ OpenStreetMap, CartoDB Light/Dark
- Layer controls for toggling valid/issue features
# Basic map
profile.show_map()
# Quality-colored map
from geoqa.visualization import MapVisualizer
viz = MapVisualizer(profile.gdf, name="My Data")
quality_map = viz.create_quality_map(profile.geometry_results)
๐ HTML Reports
Generate comprehensive, self-contained HTML reports:
profile.to_html("report.html")
Reports include:
- Quality score badge with color coding
- Dataset overview cards (features, columns, geometry type, CRS)
- Quality checks table with pass/fail/warn indicators
- Spatial extent information
- Attribute completeness with visual progress bars
- Numeric column statistics
- Geometry type distribution
๐งช Quality Checks
| Check | Severity | Description |
|---|---|---|
| Geometry Validity | ๐ด High | OGC Simple Features compliance |
| Empty Geometries | ๐ก Medium | Geometries with no coordinates |
| Duplicate Geometries | ๐ก Medium | Identical geometry pairs (WKB comparison) |
| CRS Defined | ๐ด High | Coordinate reference system presence |
| Attribute Completeness | Varies | Null/missing value analysis |
| Mixed Geometry Types | ๐ข Low | Multiple geometry types in one layer |
๐ Supported Formats
GeoQA supports all vector formats readable by GeoPandas/Fiona:
- Shapefile (
.shp) - GeoJSON (
.geojson,.json) - GeoPackage (
.gpkg) - KML (
.kml) - GML (
.gml) - CSV with geometry (
.csv) - File Geodatabase (
.gdb) - And many more via GDAL/OGR drivers
๐๏ธ Architecture
geoqa/
โโโ core.py # GeoProfile class โ main entry point
โโโ geometry.py # Geometry validation & quality checks
โโโ attributes.py # Attribute profiling & statistics
โโโ spatial.py # CRS, extent, area/length analysis
โโโ visualization.py # Folium-based interactive maps
โโโ report.py # HTML report generation (Jinja2)
โโโ cli.py # Click-based CLI interface
โโโ utils.py # Utility functions
๐ค Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/geoqa/geoqa.git
cd geoqa
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black geoqa/ tests/
isort geoqa/ tests/
๐ License
This project is licensed under the MIT License.
๐ Acknowledgments
GeoQA is inspired by the development methodology and open-source philosophy of Dr. Qiusheng Wu and the opengeos community. Key inspirations include:
- leafmap โ One-liner philosophy for geospatial analysis
- geemap โ Interactive mapping patterns
- ydata-profiling โ Data profiling concept
๐ Citation
If you find GeoQA useful in your work, please consider citing:
@software{geoqa2026,
title = {GeoQA: A Python Package for Geospatial Data Quality Assessment},
year = {2026},
url = {https://github.com/geoqa/geoqa},
license = {MIT}
}
Made with โค๏ธ for the geospatial community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geoqa-0.1.0.tar.gz.
File metadata
- Download URL: geoqa-0.1.0.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba8ffed29c2837d650fa53db832bd56dce07e39db11cfafa5c16dbad50bfa91f
|
|
| MD5 |
24865607246f158f17e4dab1d2b837a9
|
|
| BLAKE2b-256 |
ff51d16e7a96433f5a01c1416d9fbd38a5176fc72dc3cbf9370b68a920aba54a
|
File details
Details for the file geoqa-0.1.0-py3-none-any.whl.
File metadata
- Download URL: geoqa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e32f620635ce6b83a967320bb099ab7bdc5728145d473c1a07c6c33c9d48d7a4
|
|
| MD5 |
926bcf56388c4046eb45ae3e7194aa1f
|
|
| BLAKE2b-256 |
de69740a8887d5dade53a42e532a7c5b1b9eb9b7386a488c00f5299554a49690
|