Extract countries, regions and cities from a URL or text
Extract place names from a URL or text, and add context to those names – for example distinguishing between a country, region or city.
Grab the package using pip (this will take a few minutes)
pip install geograpy
Geograpy uses NLTK for entity recognition, so you’ll also need to download the models we’re using. Fortunately there’s a command that’ll take care of this for you.
Import the module, give some text or a URL, and presto.
import geograpy url = 'http://www.bbc.com/news/world-europe-26919928' places = geograpy.get_place_context(url=url)
Now you have access to information about all the places mentioned in the linked article.
Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like “Russian” instead of “Russia”).
In addition to listing the names of discovered places, you’ll also get some information about the relationships between places.
While a text might mention many places, it’s probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.
Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:
[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]
You can of course use each of Geograpy’s modules on their own. For example:
from geograpy import extraction e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928') e.find_entities() # You can now access all of the places found by the Extractor print e.places
Place context is handled in the places module. For example:
from geograpy import places pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States']) pc.set_countries() print pc.countries #['United States'] pc.set_regions() print pc.regions #['Ohio'] pc.set_cities() print pc.cities #['Cleveland'] print pc.address_strings #['Cleveland, Ohio, United States']
And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.
Geograpy uses the following excellent libraries:
Geograpy uses the following data sources:
Hat tip to Chris Albon for the name.