A package to extract geospatial extent from files and directories

These details have been verified by PyPI

Maintainers

eftyk nuest sbastiangarzon

These details have not been verified by PyPI

Project links

Project description

geoextent

Python package

Python library for extracting geospatial and temporal extents from files and directories.

Key Capabilities:

Extract spatial extents (bounding boxes, convex hulls) and temporal extents
Support for 10+ file formats (GeoJSON, CSV, Shapefile, GeoTIFF, GeoPackage, GPX, GML, KML, FlatGeobuf, Esri File Geodatabase, LAS/LAZ point clouds) plus world files
Plain-text inputs via spaCy named entity recognition + place and time-period gazetteers; recognises calendar dates, decade/century envelopes, ranges, and named geological periods (ICS GTS2020)
Journal article landing-page support for OJS (with the ojsGeo plugin), Janeway (with janeway_geometadata), Pensoft, and GeoScienceWorld; reads JSON-LD spatialCoverage, Dublin Core DC.SpatialCoverage (GeoJSON / WKT), DC.box, ISO 19139 EX_GeographicBoundingBox, and ICBM / geo.position centroids — and lifts the article DOI out of the HTML head so --ext-metadata works on plain article URLs (docs)
Direct integration with 35 research repositories (Zenodo, PANGAEA, OSF, Figshare, 4TU.ResearchData, Dryad, GFZ, RADAR, Arctic Data Center, DataONE, B2SHARE, MDI-DE, GDI-DE, NFDI4Earth, SEANOE, GeoScienceWorld, UKCEH, GBIF, DEIMS-SDR, HALO DB, GitHub, GitLab, Software Heritage, Dataverse [Harvard, DataverseNL, DataverseNO, UNC, UVA, Recherche Data Gouv, ioerDATA, heiDATA, Edmond], Pensoft, TU Dresden Opara, Senckenberg, BGR, BAW, Mendeley Data), Wikidata, any STAC catalog, and any CKAN instance (e.g. data.gov.uk, GovData.de, data.gov.au, data.gov.ie)
Process single files, directories, or multiple repositories in one call
Command-line interface and Python API
Export as GeoJSON, WKT, or WKB

📖 Full Documentation | 📦 PyPI | 🚀 Quick Start | 📓 EarthCube 2021 Article

Installation

pip install geoextent

Requirements: Python 3.10+ and GDAL 3.11.x

See the installation guide for system dependencies and Docker setup.

Quick Start

Command Line

# Extract from a file
geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson

# Extract from research repository
python -m geoextent -b -t https://doi.org/10.5281/zenodo.4593540

# Extract merged bbox from multiple local files
geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/csv/cities_NL.csv

# Extract from multiple repositories (returns merged geometry)
python -m geoextent -b 10.5281/zenodo.123 10.25532/OPARA-456

# Extract convex hull from multiple Wikidata items and open in geojson.io.
# --convex-hull keeps the GeoJSON payload under the 150 KB URL-fragment limit
# of the geojsonio wrapper; the anonymous-gist fallback for larger payloads
# is no longer reachable since GitHub requires auth for gist creation.
# See the text-extraction guide for details.
python -m geoextent -b --convex-hull --geojsonio Q64 Q35 Q60786916

# Parallel extraction from a directory (auto-detect CPU cores)
geoextent -p -b -t path/to/geodata_directory

# Parallel extraction with 4 workers
geoextent -p 4 -b -t path/to/geodata_directory

# Extract place names from free text — spaCy NER + Nominatim by default,
# no API key required. Install the optional extra and English model once:
#   pip install geoextent[nlp] && python -m spacy download en_core_web_sm
geoextent -b --text "Field campaigns in Berlin and Paris"
echo "Workshops in Tokyo and London" | geoextent -b -
geoextent -b notes.md

# Keep the highest-ranked gazetteer match instead of dropping ambiguous names
geoextent -b --ner-ambiguity top --text "Field campaigns in Berlin and Paris"

# Administrative boundaries: Nominatim returns the polygon of areal features,
# so a state name resolves to its bounding polygon rather than a centroid.
geoextent -b --ner-ambiguity top --text "Field campaign in Saxony"
# Force the centroid instead with --place-geometry point
geoextent -b --ner-ambiguity top --place-geometry point --text "Field campaign in Saxony"

# Extract a temporal extent from text — calendar dates, decades, centuries,
# ranges, and named geological time periods (ICS GTS2020 bundled gazetteer)
geoextent -t --text "Monitoring ran between 2010 and 2015"
# → "tbox": ["2010-01-01", "2015-12-31"]
geoextent -t --text "Sediment cores from the Holocene"
# → "tbox": ["-9750-01-01", "1950-01-01"]  (signed ISO 8601: years before 1 BCE
#    are prefixed with `-`; deep-time periods like the Mesozoic produce
#    long-year strings such as "-251900050-01-01")
geoextent -b -t --text "Pleistocene cores near Berlin re-surveyed on 2024-05-12"

# Show the source text with matched place names and periods highlighted
geoextent -b -t --annotate brackets \
  --text "Sediment cores in Berlin span the Holocene; resurvey on 2024-05-12"
# → ...JSON...
# → ---annotated source (brackets)---
# → Sediment cores in [[Berlin|place]] span the [[Holocene|period]]; resurvey on [[2024-05-12|date]]

# Disable text extraction (e.g. when processing directories of structured
# data and you don't want README.md to be NER-ed)
geoextent -b -t --text-method none path/to/data_dir

For each matched place / date / period, geoextent also emits standoff char_start / char_end offsets into the (NFC-normalised) source so external tools can highlight matches independently:

from geoextent.lib import extent
result = extent.from_text("Sediment cores in Berlin span the Holocene.",
                          bbox=True, tbox=True,
                          ner_ambiguity="top")
src = result["source_text"]
for rec in result["place_names"] + result["date_entities"]:
    s, e = rec["char_start"], rec["char_end"]
    print(f"{rec.get('kind', 'place'):6} {src[s:e]!r} → {rec.get('gazetteer_url') or rec.get('start')}")

See the text-extraction guide for examples and gotchas, or the highlighting guide for the offset contract and a JS/Java re-encoding recipe.

See the CLI guide for all options.

Python API

import geoextent.lib.extent as geoextent

# From file
result = geoextent.fromFile('data.geojson', bbox=True, tbox=True)

# From directory
result = geoextent.fromDirectory('data/', bbox=True, tbox=True)

# From directory with parallel extraction (0 = auto-detect CPU cores)
result = geoextent.from_directory('data/', bbox=True, tbox=True, workers=0)

# From repository (single or multiple)
result = geoextent.fromRemote('10.5281/zenodo.4593540', bbox=True)

identifiers = ['10.5281/zenodo.4593540', '10.25532/OPARA-581']
result = geoextent.fromRemote(identifiers, bbox=True)
print(result['bbox'])  # Merged bounding box covering all resources

See the API documentation and examples.

What Can I Do With geoextent?

Extract Spatial Extents - Get bounding boxes or convex hulls from geospatial files
Process Research Data - Extract extents from Zenodo, Figshare, Dryad, PANGAEA, OSF, DataONE, SEANOE, UKCEH, GBIF, DEIMS-SDR, NFDI4Earth, GitHub, GitLab, any STAC catalog, and more
Batch Processing - Process directories or multiple repositories in one call
Add Location Context - Automatic placename lookup for your data
Flexible Output - Export as GeoJSON, WKT, or WKB for use in other tools
Interactive Visualization - Open extracted extents in geojson.io with one command

Documentation

Quick Start Guide - Get started in minutes
Installation Guide - System dependencies, Docker setup
Examples - Common usage patterns with code
CLI Reference - Command-line options
Python API - Function signatures and parameters
Core Features - Essential features for everyday use
Advanced Features - Specialized options
Content Providers - Repository integration details
Supported Formats - File format details
Development Guide - Contributing and testing

Development

This project was developed as part of the DFG-funded research project Opening Reproducible Research (o2r, https://o2r.info).

# Install dev and test dependencies
pip install -e .[dev,test,docs]

# Run tests (parallel execution enabled by default with -n auto)
pytest

# Run tests with specific number of workers
pytest -n 4

# Disable parallel execution for debugging
pytest -n 0

# Format code
black geoextent/ tests/
pre-commit install

See the development guide for detailed instructions.

Showcase Notebooks

Interactive Jupyter notebooks demonstrating geoextent are available in the showcase/ directory:

NFDI4Earth Knowledge Hub × geoextent — Queries the NFDI4Earth Knowledge Hub SPARQL endpoint to map NFDI4Earth-labelled and harvested repositories to geoextent providers, analyses dataset spatial/temporal metadata coverage, and demonstrates live extraction with geoextent.fromRemote().
Exploring Research Data Repositories with geoextent — EarthCube 2021 case study analysing Zenodo records.

To run the notebooks:

cd showcase
pip install -r requirements.txt
pip install -e ..  # install geoextent from local checkout
jupyter lab

Contributing

Contributions are welcome! Please use the issue tracker to report bugs or suggest features, and submit pull requests for code or documentation improvements.

Citation

If you use geoextent in your research, please cite:

Nüst, Daniel; Garzón, Sebastian and Qamaz, Yousef. (2021, May 11). o2r-project/geoextent (Version v0.7.1). Zenodo. https://doi.org/10.5281/zenodo.3925693

License

This software is published under the MIT license. See the LICENSE file for details.

This documentation is published under a Creative Commons CC0 1.0 Universal License.

Bundled third-party material

geoextent/lib/data/periods.json — the named-time-period gazetteer used by the text/NER source. Derived from the International Chronostratigraphic Chart (ICS / IUGS, GTS2020 vocabulary), distributed by CGI-IUGS at https://github.com/CGI-IUGS/timescale-data and dedicated to the public domain under CC0-1.0 (https://creativecommons.org/publicdomain/zero/1.0/). The file embeds the upstream commit SHA, build timestamp, and full attribution string in its metadata block; run geoextent --list-periods to read it.
The DOI regex and helper functions in geoextent/lib/helpfunctions.py are derived from idutils (© 2015-2018 CERN; © 2018 Alan Rubin) under BSD-3-Clause, as noted inline.

Project details

These details have been verified by PyPI

Maintainers

eftyk nuest sbastiangarzon

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.13.0

May 15, 2026

0.12.0

Feb 13, 2026

0.7.1

May 14, 2021

0.3.0

Jul 1, 2020

0.2.0

Apr 26, 2020

0.1.0

Mar 25, 2020

0.0.1

Aug 16, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoextent-0.13.0.tar.gz (17.3 MB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geoextent-0.13.0-py3-none-any.whl (314.0 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file geoextent-0.13.0.tar.gz.

File metadata

Download URL: geoextent-0.13.0.tar.gz
Upload date: May 15, 2026
Size: 17.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for geoextent-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`1063b368de0928503b42998c6b9edfd18dca512872c2290b4fca9d7f468e1974`
MD5	`27d66f3c98232892b31e5b0ae74ad266`
BLAKE2b-256	`99b94f4c50fd569c62e04028024690c6d3c7766948c2d7297bfe378253d938e4`

See more details on using hashes here.

File details

Details for the file geoextent-0.13.0-py3-none-any.whl.

File metadata

Download URL: geoextent-0.13.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 314.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for geoextent-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0b0e94c996e2d0c7b8d91f875a379163b7a406fb83a0d530f29c324a615919a`
MD5	`ca6210e58778139f8c31d3d718b7c23d`
BLAKE2b-256	`f9e2e999f8b8f8e22997f7a59c00a0288d6700db3f5dd54681888c8dfb7aba94`

See more details on using hashes here.

geoextent 0.13.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

geoextent

Installation

Quick Start

Command Line

Python API

What Can I Do With geoextent?

Documentation

Development

Showcase Notebooks

Contributing

Citation

License

Bundled third-party material

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes