Skip to main content

World Guess is a package to identify subject countries in documents

Project description

worldguess

Summary

This python package guess the country of a subject text, name or list based on places names frequencies. It works in any languages/alphabet.

Warning

Originally, this library was made to be used with a list of places extracted with an NER program such as Spacy.

I heavely recommend using it that way.

It is also possible to use it on a text, but the precision is not very good, as some words in a language correspond to a place in another language.

It is also still a work in progress. I did a version of this library in an old internship, to quickly identify and classify documents according to countries, and thought it was a cool tool to share, so I remade it from scratch at home recently (with permission of my old boss).

It is an easy way to identify the source country of an news article for example, and automatically tag the country.

Usage

With a list:

wg = WorldGuesser()
text = ["London", "Manchester", "UK", "BRISTOL", "Scotland", "Berlin"]
result = wg.from_list(text)
self.assertEqual(result[0], "United Kingdom")

With a name:

wg = WorldGuesser()
text = "санкт-петербург"
result = wg.from_place(text)
self.assertEqual(result[0], "Russia")

If no country is found, the first result in the list will be "Unknown"

Data Sources

The date sources come from the GeoNames Database: https://www.geonames.org/

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for worldguess, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size worldguess-0.0.1-py3-none-any.whl (3.0 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size worldguess-0.0.1.tar.gz (2.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page