Extract countries, regions and cities from a URL or text
Project description
Geograpy3 is a fork of Geograpy2, which is itself a fork of geograpy and inherits most of it, but solves several problems (such as support for utf8, places names with multiple words, confusion over homonyms etc). Also, Geograpy3 is compatible with Python 3, unlike Geography2.
Geograpy3
Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
Install & Setup
Grab the package using pip
(this will take a few minutes)
pip install geograpy3
Geograpy3 uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.
geograpy-nltk
Basic Usage
Import the module, give some text or a URL, and presto.
import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)
Now you have access to information about all the places mentioned in the linked article.
places.countries
contains a list of country namesplaces.regions
contains a list of region namesplaces.cities
contains a list of city namesplaces.other
lists everything that wasn't clearly a country, region or city
Note that the other
list might be useful for shorter texts, to pull out
information like street names, points of interest, etc, but at the moment is
a bit messy when scanning longer texts that contain possessive forms of proper
nouns (like "Russian" instead of "Russia").
But Wait, There's More
In addition to listing the names of discovered places, you'll also get some information about the relationships between places.
places.country_regions
regions broken down by countryplaces.country_cities
cities broken down by countryplaces.address_strings
city, region, country strings useful for geocoding
Last But Not Least
While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.
places.country_mentions
places.region_mentions
places.city_mentions
Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:
[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]
If You're Really Serious
You can of course use each of Geograpy's modules on their own. For example:
from geograpy import extraction
e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()
# You can now access all of the places found by the Extractor
print e.places
Place context is handled in the places
module. For example:
from geograpy import places
pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])
pc.set_countries()
print pc.countries #['United States']
pc.set_regions()
print pc.regions #['Ohio']
pc.set_cities()
print pc.cities #['Cleveland']
print pc.address_strings #['Cleveland, Ohio, United States']
And of course all of the other information shown above (country_regions
etc)
is available after the corresponding set_
method is called.
Credits
Geograpy uses the following excellent libraries:
- NLTK for entity recognition
- newspaper for text extraction from HTML
- jellyfish for fuzzy text match
- pycountry for country/region lookups
Geograpy uses the following data sources:
- GeoLite2 for city lookups
- ISO3166ErrorDictionary for common country mispellings via Sara-Jayne Terp
Hat tip to Chris Albon for the name.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file geograpy3-0.1.0.tar.gz
.
File metadata
- Download URL: geograpy3-0.1.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd5e0e5f915e563b2bc7d938d8e98ec17d6ae3c01235fa2bf4a711b5a519f04b |
|
MD5 | 39fa5e0c27e15509f162e60098b13f5b |
|
BLAKE2b-256 | b96c21950ec1077839fda09d871b32b14d0b449c7ba424f4586eeaa3185b8ddd |
Provenance
File details
Details for the file geograpy3-0.1.0-py3.5.egg
.
File metadata
- Download URL: geograpy3-0.1.0-py3.5.egg
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb6d7240fe84bfd07e381d7aa5ed186e641d332eb415b104ac0d7dfd5bc815de |
|
MD5 | d0635eef80f5ac627a7f940d2851d1ca |
|
BLAKE2b-256 | ef32686c813411d7f66c34403eb9e87a4ec73b32e131362f6cda4a307ab6ab20 |
Provenance
File details
Details for the file geograpy3-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: geograpy3-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 478d12859d770a8d9c4a139ab48f88773a0043c8caa4280aa6d90875ed17f605 |
|
MD5 | 9955ad494529de0c2483830b7321b485 |
|
BLAKE2b-256 | 00eaa390a80d2295169ffb1510486aa235d9e58f3bdacd4d9922ba7300159861 |