Skip to main content

A geoparsing library for English texts

Project description

Geoparser

Geoparser is a Python library for geoparsing English texts. It leverages spaCy for toponym recognition and tine-tuned SentenceTransformer models for toponym resolution.

Installation

Install Geoparser using pip:

pip install geoparser

Download Required Data

After installation, you need to download the necessary data files for Geoparser to function properly:

python -m geoparser download

This command will download the following resources:

These files are stored in the user-specific data directory:

  • Windows: C:\Users\<Username>\AppData\Local\geoparser\
  • macOS: ~/Library/Application Support/geoparser/
  • Linux: ~/.local/share/geoparser/

Please ensure you have adequate disk space available, as the total size of these files is approximately 2.3 GB.

Usage

Instantiating the Geoparser

To use Geoparser, you need to instantiate an object of the Geoparser class. You can specify which spaCy and transformer model to use, optimizing either for accuracy or speed. By default, the library uses accuracy-optimized models:

from geoparser import Geoparser

geo = Geoparser()

For faster performance, you can opt for the smaller models:

geo = Geoparser(spacy_model='en_core_web_sm', transformer_model='dguzh/geo-all-MiniLM-L6-v2')

You can mix and match these models depending on your specific needs. Note that the transformer models dguzh/geo-all-distilroberta-v1 and dguzh/geo-all-MiniLM-L6-v2 are preliminary versions. Future updates aim to refine these models to improve the accuracy of toponym disambiguation.

Parsing Texts

Geoparser is optimized for parsing large collections of texts at once. Pass a list of strings to the parse method:

docs = geo.parse(["Sample text 1", "Sample text 2", "Sample text 3"])

The parse method returns a list of Document objects, where each Document contains a list of Toponym objects. Each Toponym that is successfully geocoded will have a corresponding Location object with detailed geographical data:

Location Attributes

Each Location object has the following attributes:

  • geonameid: The unique identifier for the place in the GeoNames database.
  • name: The name of the geographical location.
  • admin2_geonameid: The GeoNames identifier for the second-level administrative division.
  • admin2_name: The name of the second-level administrative division.
  • admin1_geonameid: The GeoNames identifier for the first-level administrative division.
  • admin1_name: The name of the first-level administrative division.
  • country_geonameid: The GeoNames identifier for the country.
  • country_name: The name of the country.
  • feature_name: The type of geographical feature (e.g., mountain, lake).
  • latitude: The latitude of the location.
  • longitude: The longitude of the location.
  • elevation: The elevation of the location in meters.
  • population: The population of the location.

Example

Here's an example showing how the library might be used:

text = "Zurich is a city rich in history."
docs = geo.parse([text])

for doc in docs:
    for toponym in doc.toponyms:
        if toponym.location:
            print(toponym, "->", toponym.location)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoparser-0.1.4.tar.gz (10.1 kB view hashes)

Uploaded Source

Built Distribution

geoparser-0.1.4-py3-none-any.whl (10.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page