Skip to main content

A mini Python package for geotagging text and retrieving location info.

Project description

Installation

pip install geomentions

Usage

Below is a quick example of how to use geomentions:

from geomentions import GeoMentions

# Instantiate the GeoMentions (True (default) means we standardize names to a single variant (Munich and München is counted as the same entity))
gm = GeoMentions(standardize_names=True)

text = "Munich is in Germany can be translated to german as München ist in Deutschland. Another city that is mentioned here is New York."
result = gm.fit(text)

# Basic summary
print(result)
# GeoMentionsResult(cities=3, countries=2)
# This counts all city and country mentions, also duplicates

# City mentions
print(result.city_mentions)
# [CityMention(name='Munich', count=2, country_code='DE', population=1260391, coordinates=[48.13743, 11.57549]),
# CityMention(name='New York City', count=1, country_code='US', population=8804190, coordinates=[40.71427, -74.00597])]

# Country mentions
print(result.country_mentions)
# [CityMention(name='Federal Republic of Germany', count=2, country_code='DE', population=82927922, coordinates=[51.5, 10.5])]

# All country counts (implicit and explicit mentions):
print(result.country_counts)
# {'DE': {'total_count': 4, 'implicit_count': 2, 'explicit_count': 2},
# 'US': {'total_count': 1, 'implicit_count': 1, 'explicit_count': 0}}

# Filter city results by country code
print(result.filter_cities(country_code='US'))
# [CityMention(name='New York City', count=1, country_code='US', population=8804190, coordinates=[40.71427, -74.00597])]

# Filter city mentions by minimum population
print(result.filter_cities(min_population=3_000_000))
# [CityMention(name='New York City', count=1, country_code='US', population=8804190, coordinates=[40.71427, -74.00597])]

# Filter city mentions by max population
print(result.filter_cities(max_population=3_000_000))
# [CityMention(name='Munich', count=2, country_code='DE', population=1260391, coordinates=[48.13743, 11.57549])]

# Extract data fields from a matched entity
print(result.city_mentions[0].coordinates)
#  [48.13743, 11.57549]

# Convert result to a dictionary
print(result.to_dict())

Features

  • City & Country Detection: Identify city and country mentions in text (including bigrams).
  • Population & Coordinates: Retrieve population, country code, coordinates, and time zone for mentioned entities.
  • Summaries by Country: Automatically count how many times a city or country is mentioned.
  • Filtering: Filter mentions by minimum population or country code.
  • Multi-Language: City and country entities are detected in many languages and almost all spellings.
  • Lightweight and fast: No external dependencies

Language Support

  • The package supports all languages given in the GeoNames database
  • In the alternate names table GeoNames supports ~600 languages. In this implementation the support might be slightly lower but still in the hundreds.
from geomentions import GeoMentions

gm = GeoMentions(standardize_names=True)

text = "Берлин is the cyrillic spelling for Berlin"
print(gm.fit(text).city_mentions)
# [CityMention(name='Berlin', count=2, country_code='DE', population=3426354, coordinates=[52.52437, 13.41053])]

text = "கம்பளை is the spelling for the city Gampola in Sri Lanka"
print(gm.fit(text).city_mentions)
# [CityMention(name='Gampola', count=2, country_code='LK', population=24283, coordinates=[7.1643, 80.5696]),
# CityMention(name='Lanka', count=1, country_code='IN', population=36805, coordinates=[25.92907, 92.94856])]

text = "'자르브뤼켄 is the spelling for Saarbrücken in Germany"
print(gm.fit(text).city_mentions)
# [CityMention(name='Saarbrücken', count=2, country_code='DE', population=179349, coordinates=[49.23262, 7.00982])]

Underlying Data

  • Source: GeoNames.org database
  • License: Creative Commons Attribution 4.0 License
  • Index creation: Data is preprocessed to make the package lightweight and fast. The index creation script can be accessed here.

City Index:

  • All cities from the GeoNames database with >1000 inhabitants
  • 107.444 unique cities supported
  • 671.784 entries in index (multiple spellings and languages per unique city)

City Index:

  • All countries from the GeoNames database
  • 193 unique cities supported
  • 32.010 entries in index (multiple spellings and languages per unique country)

Access the index

from geomentions import GeoMentions

gm = GeoMentions()
city_index = gm.city_index
county_index = gm.country_index

Contributing

Contributions to GeoMentions are highly welcome! If you want to contribute, either:

  • Report Issues:
    • If you find a bug or have a suggestion, please open an issue in the GitHub issue tracker.

or

  • Implement features/bugfixes yourself:

    1. Fork & Branch:
      Fork the repository and create a new branch for your changes (e.g., feature/new-feature or bugfix/issue-number).

    2. Make Changes & Test:
      Implement your improvements. If applicable, add tests to cover your changes and update documentation as needed. Make sure all the test in tests/test_geomentions.py pass.

    3. Submit a Pull Request:
      Push your branch to your fork and open a pull request against the main repository. Provide a brief description of your changes.

Thank you for helping to improve GeoMentions!

Changelog

[0.0.1] - 2025-02-23

  • Initial release.

License

MIT License

Copyright (c) 2025 Malte Genschow

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geomentions-0.0.1.tar.gz (16.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geomentions-0.0.1-py3-none-any.whl (16.4 MB view details)

Uploaded Python 3

File details

Details for the file geomentions-0.0.1.tar.gz.

File metadata

  • Download URL: geomentions-0.0.1.tar.gz
  • Upload date:
  • Size: 16.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.1

File hashes

Hashes for geomentions-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8c9854747bad9fa50fe69e777d9d0dbe7fa16212bbb01c9a75e910351098eab5
MD5 b1329f0ef8824254e33efdec85458ed3
BLAKE2b-256 3b634ea339b5de4f4beb4eddf0433f7659f3ce221f541186f3b4698bd0fbbace

See more details on using hashes here.

File details

Details for the file geomentions-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: geomentions-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.1

File hashes

Hashes for geomentions-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dac292662844db48c5f6492e6203f6b7660787eb09e4f9660edbe45fc5b905c4
MD5 c6021ee7cdf8f6be269d59d4c82beedd
BLAKE2b-256 ec13e17700890cb8918f2e6dd03425e3d725abe989c5d4aa181be4b0b49aaf08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page