Skip to main content

Extract countries, regions and cities from a URL or text

Project description

Geograpy3 is a fork of Geograpy2, which is itself a fork of geograpy and inherits most of it, but solves several problems (such as support for utf8, places names with multiple words, confusion over homonyms etc). Also, Geograpy3 is compatible with Python 3, unlike Geography2.

Geograpy3

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Install & Setup

Grab the package using pip (this will take a few minutes)

pip install geograpy3

Geograpy3 uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.

geograpy-nltk

Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

Now you have access to information about all the places mentioned in the linked article.

  • places.countries contains a list of country names
  • places.regions contains a list of region names
  • places.cities contains a list of city names
  • places.other lists everything that wasn't clearly a country, region or city

Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia").

But Wait, There's More

In addition to listing the names of discovered places, you'll also get some information about the relationships between places.

  • places.country_regions regions broken down by country
  • places.country_cities cities broken down by country
  • places.address_strings city, region, country strings useful for geocoding

Last But Not Least

While a text might mention many places, it's probably focused on one or two, so Geograpy also breaks down countries, regions and cities by number of mentions.

  • places.country_mentions
  • places.region_mentions
  • places.city_mentions

Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  

If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='http://www.bbc.com/news/world-europe-26919928')
e.find_entities()

# You can now access all of the places found by the Extractor
print e.places

Place context is handled in the places module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print pc.regions #['Ohio']

pc.set_cities()
print pc.cities #['Cleveland']

print pc.address_strings #['Cleveland, Ohio, United States']

And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.

Credits

Geograpy uses the following excellent libraries:

Geograpy uses the following data sources:

Hat tip to Chris Albon for the name.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geograpy3-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

geograpy3-0.1.0-py3.5.egg (1.3 MB view details)

Uploaded Source

geograpy3-0.1.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file geograpy3-0.1.0.tar.gz.

File metadata

  • Download URL: geograpy3-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2

File hashes

Hashes for geograpy3-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd5e0e5f915e563b2bc7d938d8e98ec17d6ae3c01235fa2bf4a711b5a519f04b
MD5 39fa5e0c27e15509f162e60098b13f5b
BLAKE2b-256 b96c21950ec1077839fda09d871b32b14d0b449c7ba424f4586eeaa3185b8ddd

See more details on using hashes here.

Provenance

File details

Details for the file geograpy3-0.1.0-py3.5.egg.

File metadata

  • Download URL: geograpy3-0.1.0-py3.5.egg
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2

File hashes

Hashes for geograpy3-0.1.0-py3.5.egg
Algorithm Hash digest
SHA256 cb6d7240fe84bfd07e381d7aa5ed186e641d332eb415b104ac0d7dfd5bc815de
MD5 d0635eef80f5ac627a7f940d2851d1ca
BLAKE2b-256 ef32686c813411d7f66c34403eb9e87a4ec73b32e131362f6cda4a307ab6ab20

See more details on using hashes here.

Provenance

File details

Details for the file geograpy3-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: geograpy3-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.2

File hashes

Hashes for geograpy3-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 478d12859d770a8d9c4a139ab48f88773a0043c8caa4280aa6d90875ed17f605
MD5 9955ad494529de0c2483830b7321b485
BLAKE2b-256 00eaa390a80d2295169ffb1510486aa235d9e58f3bdacd4d9922ba7300159861

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page