Skip to main content

GeoQA: A Python package for geospatial data quality assessment and interactive profiling

Project description

GeoQA Logo

GeoQA

Geospatial Data Quality Assessment & Interactive Profiling

Profile any geodataset with a single line of code

PyPI Python License CI Docs


๐ŸŒ What is GeoQA?

GeoQA is a Python package for automated quality assessment and interactive profiling of geospatial vector data. Think of it as ydata-profiling (formerly pandas-profiling) but purpose-built for geodata.

GeoQA lets you:

  • Profile any vector dataset (Shapefile, GeoJSON, GeoPackage, etc.) with one line of code
  • Validate geometry quality (invalid, empty, duplicate, mixed types)
  • Analyze attribute completeness, statistics, and distributions
  • Visualize data on interactive maps with quality issue highlighting
  • Generate self-contained HTML quality reports
  • Automate QA/QC workflows via CLI or Python API

โœจ Key Features

Feature Description
๐Ÿ” One-liner Profiling geoqa.profile("data.shp") โ€” instant dataset overview
โœ… Geometry Validation OGC-compliant validity checks, empty/null detection, duplicate finding
๐Ÿ“Š Attribute Profiling Data types, null analysis, unique values, descriptive statistics
๐Ÿ—บ๏ธ Interactive Maps Folium-based maps with issue highlighting and quality coloring
๐Ÿ“‹ HTML Reports Beautiful, self-contained quality reports with charts and tables
โšก CLI Interface geoqa profile data.shp โ€” terminal access to all features
๐Ÿ”ง Auto-fix Repair invalid geometries with profile.geometry_results
๐Ÿ“ Spatial Analysis CRS info, extent, area/length statistics, centroid computation

๐Ÿ“ฆ Installation

pip

pip install geoqa

From source (development)

git clone https://github.com/geoqa/geoqa.git
cd geoqa
pip install -e ".[dev]"

Dependencies

GeoQA requires Python 3.9+ and depends on:

  • geopandas โ€” Geospatial data manipulation
  • shapely โ€” Geometry operations and validation
  • folium โ€” Interactive map visualization
  • matplotlib โ€” Static charts
  • pandas / numpy โ€” Data analysis
  • jinja2 โ€” Report template rendering
  • click โ€” CLI framework
  • rich โ€” Terminal formatting

๐Ÿš€ Quick Start

Python API

import geoqa

# Profile a dataset with one line
profile = geoqa.profile("buildings.shp")

# View summary
profile.summary()
# Output:
# โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
# โ”‚  GeoQA Profile: buildings                โ”‚
# โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ Property        โ”‚ Value        โ”‚
# โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
# โ”‚ Features        โ”‚ 12,456       โ”‚
# โ”‚ Columns         โ”‚ 8            โ”‚
# โ”‚ Geometry Type   โ”‚ Polygon      โ”‚
# โ”‚ CRS             โ”‚ EPSG:4326    โ”‚
# โ”‚ Quality Score   โ”‚ 94.2/100     โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

# Interactive map with issue highlighting
profile.show_map()

# Quality check details
checks = profile.quality_checks()
print(checks)

# Generate HTML report
profile.to_html("quality_report.html")

# Attribute statistics
profile.attribute_stats()

# Geometry measurements
profile.geometry_stats()

From a GeoDataFrame

import geopandas as gpd
import geoqa

gdf = gpd.read_file("roads.geojson")
profile = geoqa.profile(gdf, name="City Roads")
profile.summary()

CLI

# Profile a dataset
geoqa profile data.shp

# Generate HTML report
geoqa report data.shp --output report.html

# Run quality checks only
geoqa check data.geojson

# Show interactive map
geoqa show data.gpkg --output map.html

๐Ÿ“Š Quality Score

GeoQA computes an overall quality score (0-100) based on:

Component Weight Description
Geometry Validity 40% Percentage of valid geometries (OGC compliance)
Attribute Completeness 30% Percentage of non-null attribute values
CRS Defined 15% Whether a coordinate reference system is set
No Empty Geometries 15% Percentage of non-empty geometries

๐Ÿ—บ๏ธ Interactive Visualization

GeoQA creates interactive folium maps with:

  • Auto-reprojection to WGS84 for web display
  • Quality highlighting โ€” invalid geometries in red, valid in blue
  • Interactive tooltips with attribute data
  • Multiple basemaps โ€” OpenStreetMap, CartoDB Light/Dark
  • Layer controls for toggling valid/issue features
# Basic map
profile.show_map()

# Quality-colored map
from geoqa.visualization import MapVisualizer
viz = MapVisualizer(profile.gdf, name="My Data")
quality_map = viz.create_quality_map(profile.geometry_results)

๐Ÿ“‹ HTML Reports

Generate comprehensive, self-contained HTML reports:

profile.to_html("report.html")

Reports include:

  • Quality score badge with color coding
  • Dataset overview cards (features, columns, geometry type, CRS)
  • Quality checks table with pass/fail/warn indicators
  • Spatial extent information
  • Attribute completeness with visual progress bars
  • Numeric column statistics
  • Geometry type distribution

๐Ÿงช Quality Checks

Check Severity Description
Geometry Validity ๐Ÿ”ด High OGC Simple Features compliance
Empty Geometries ๐ŸŸก Medium Geometries with no coordinates
Duplicate Geometries ๐ŸŸก Medium Identical geometry pairs (WKB comparison)
CRS Defined ๐Ÿ”ด High Coordinate reference system presence
Attribute Completeness Varies Null/missing value analysis
Mixed Geometry Types ๐ŸŸข Low Multiple geometry types in one layer

๐Ÿ“ Supported Formats

GeoQA supports all vector formats readable by GeoPandas/Fiona:

  • Shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • GeoPackage (.gpkg)
  • KML (.kml)
  • GML (.gml)
  • CSV with geometry (.csv)
  • File Geodatabase (.gdb)
  • And many more via GDAL/OGR drivers

๐Ÿ—๏ธ Architecture

geoqa/
โ”œโ”€โ”€ core.py           # GeoProfile class โ€” main entry point
โ”œโ”€โ”€ geometry.py       # Geometry validation & quality checks
โ”œโ”€โ”€ attributes.py     # Attribute profiling & statistics
โ”œโ”€โ”€ spatial.py        # CRS, extent, area/length analysis
โ”œโ”€โ”€ visualization.py  # Folium-based interactive maps
โ”œโ”€โ”€ report.py         # HTML report generation (Jinja2)
โ”œโ”€โ”€ cli.py            # Click-based CLI interface
โ””โ”€โ”€ utils.py          # Utility functions

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

# Clone the repository
git clone https://github.com/geoqa/geoqa.git
cd geoqa

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black geoqa/ tests/
isort geoqa/ tests/

๐Ÿ“„ License

This project is licensed under the MIT License.

๐Ÿ™ Acknowledgments

GeoQA is inspired by the development methodology and open-source philosophy of Dr. Qiusheng Wu and the opengeos community. Key inspirations include:

  • leafmap โ€” One-liner philosophy for geospatial analysis
  • geemap โ€” Interactive mapping patterns
  • ydata-profiling โ€” Data profiling concept

๐Ÿ“– Citation

If you find GeoQA useful in your work, please consider citing:

@software{geoqa2026,
  title = {GeoQA: A Python Package for Geospatial Data Quality Assessment},
  year = {2026},
  url = {https://github.com/geoqa/geoqa},
  license = {MIT}
}

Made with โค๏ธ for the geospatial community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoqa-0.1.0.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoqa-0.1.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file geoqa-0.1.0.tar.gz.

File metadata

  • Download URL: geoqa-0.1.0.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for geoqa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba8ffed29c2837d650fa53db832bd56dce07e39db11cfafa5c16dbad50bfa91f
MD5 24865607246f158f17e4dab1d2b837a9
BLAKE2b-256 ff51d16e7a96433f5a01c1416d9fbd38a5176fc72dc3cbf9370b68a920aba54a

See more details on using hashes here.

File details

Details for the file geoqa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: geoqa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for geoqa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e32f620635ce6b83a967320bb099ab7bdc5728145d473c1a07c6c33c9d48d7a4
MD5 926bcf56388c4046eb45ae3e7194aa1f
BLAKE2b-256 de69740a8887d5dade53a42e532a7c5b1b9eb9b7386a488c00f5299554a49690

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page