Extract countries, regions and cities from a URL or text

These details have not been verified by PyPI

Project links

Project description

geograpy3

geograpy3 is a fork of geograpy2, which is itself a fork of geograpy and inherits most of it, but solves several problems (such as support for utf8, places names with multiple words, confusion over homonyms etc). Also, geograpy3 is compatible with Python 3, unlike geograpy2.

since geograpy3 0.0.2 cities,countries and regions are matched against a database derived from the corresponding wikidata entries

What it is

geograpy extracts place names from a URL or text, and adds context to those names -- for example distinguishing between a country, region or city.

The extraction is a two step process. The first process is a Natural Language Processing task which analyzes a text for potential mentions of geographic locations. In the next step the words which represent such locations are looked up using the Locator.

If you already know that your content has geographic information you might want to use the Locator interface directly.

Examples/Tutorial

see Examples/Tutorial Wiki

Install & Setup

Grab the package using pip (this will take a few minutes)

pip install geograpy3

geograpy3 uses NLTK for entity recognition, so you'll also need to download the models we're using. Fortunately there's a command that'll take care of this for you.

geograpy-nltk

Getting the source code

git clone https://github.com/somnathrakshit/geograpy3
cd geograpy3
scripts/install

Basic Usage

Import the module, give some text or a URL, and presto.

import geograpy
url = 'https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay'
places = geograpy.get_geoPlace_context(url=url)

Now you have access to information about all the places mentioned in the linked article.

places.countries contains a list of country names
places.regions contains a list of region names
places.cities contains a list of city names
places.other lists everything that wasn't clearly a country, region or city

Note that the other list might be useful for shorter texts, to pull out information like street names, points of interest, etc, but at the moment is a bit messy when scanning longer texts that contain possessive forms of proper nouns (like "Russian" instead of "Russia").

But Wait, There's More

In addition to listing the names of discovered places, you'll also get some information about the relationships between places.

places.country_regions regions broken down by country
places.country_cities cities broken down by country
places.address_strings city, region, country strings useful for geocoding

Last But Not Least

While a text might mention many places, it's probably focused on one or two, so geograpy3 also breaks down countries, regions and cities by number of mentions.

places.country_mentions
places.region_mentions
places.city_mentions

Each of these returns a list of tuples. The first item in the tuple is the place name and the second item is the number of mentions. For example:

[('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]

If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:

from geograpy import extraction

e = extraction.Extractor(url='https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay')
e.find_geoEntities()

# You can now access all of the places found by the Extractor
print(e.places)

Place context is handled in the places module. For example:

from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print(pc.regions #['Ohio'])

pc.set_cities()
print(pc.cities #['Cleveland'])

print(pc.address_strings #['Cleveland, Ohio, United States'])

And of course all of the other information shown above (country_regions etc) is available after the corresponding set_ method is called.

Stackoverflow

Questions tagged with 'geograpy'

Credits

geograpy3 uses the following excellent libraries:

NLTK for entity recognition
newspaper for text extraction from HTML
jellyfish for fuzzy text match
pylodstorage for storage and retrieval of tabular data from SQL and SPARQL sources

geograpy3 uses the following data sources:

ISO3166ErrorDictionary for common country mispellings via Sara-Jayne Terp
Wikidata for country/region/city information with disambiguation via population

Hat tip to Chris Albon for the name.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.7

Sep 29, 2023

0.2.6

Apr 8, 2023

0.2.5

Dec 22, 2022

0.2.4

Oct 26, 2022

0.2.3

May 31, 2022

0.2.2

Nov 29, 2021

0.2.1

Aug 20, 2021

0.2.0

Aug 20, 2021

0.1.31

Aug 12, 2021

0.1.30

Aug 11, 2021

0.1.29

Jul 12, 2021

0.1.28

Jul 5, 2021

0.1.27

Jun 23, 2021

0.1.25

Jun 21, 2021

0.1.24

Oct 10, 2020

0.1.22

Oct 10, 2020

0.1.20

Oct 10, 2020

0.1.19

Sep 29, 2020

0.1.18

Sep 27, 2020

0.1.16

Sep 26, 2020

0.1.15

Sep 26, 2020

0.1.14

Sep 21, 2020

0.1.12

Sep 20, 2020

0.1.11

Sep 20, 2020

0.1.9

Sep 19, 2020

0.1.7

Sep 18, 2020

0.1.6

Sep 11, 2020

0.1.5

Sep 11, 2020

0.1.4

Sep 10, 2020

0.1.4rc1 pre-release

Sep 10, 2020

0.1.3

Sep 10, 2020

0.1.2

Sep 21, 2018

0.1.1

Sep 21, 2018

0.1.0

Sep 21, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geograpy3-0.2.7.tar.gz (51.7 kB view details)

Uploaded Sep 29, 2023 Source

Built Distribution

geograpy3-0.2.7-py3-none-any.whl (35.4 kB view details)

Uploaded Sep 29, 2023 Python 3

File details

Details for the file geograpy3-0.2.7.tar.gz.

File metadata

Download URL: geograpy3-0.2.7.tar.gz
Upload date: Sep 29, 2023
Size: 51.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for geograpy3-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`9fed9fd8e1cf3757d5d696cdf2670241e61dc03731b4de7c2c053b98b0c5954c`
MD5	`7c583fcec5415d7dae89505d36989933`
BLAKE2b-256	`67814f3cf76cdf4118aaaf594f2d1f34c220ecacf87574a7d42a4a690caefbed`

See more details on using hashes here.

File details

Details for the file geograpy3-0.2.7-py3-none-any.whl.

File metadata

Download URL: geograpy3-0.2.7-py3-none-any.whl
Upload date: Sep 29, 2023
Size: 35.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for geograpy3-0.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a290a5e95e3320b49abd5fa4810bfb890924e40cf30846073cb9e8492b46f3bb`
MD5	`e8f2631023f627934ec9bb50a1e12497`
BLAKE2b-256	`bae1a6f68aa31163c2572e2b24f63783d1c10aada78388ae8bc798b679cb1539`

See more details on using hashes here.

geograpy3 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

geograpy3

What it is

Examples/Tutorial

Install & Setup

Getting the source code

Basic Usage

But Wait, There's More

Last But Not Least

If You're Really Serious

Stackoverflow

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes