World Guess is a package to identify subject countries in documents
Project description
worldguess
Summary
This python package guess the country of a subject text, name or list based on places names frequencies. It works in any languages/alphabet.
Warning
Originally, this library was made to be used with a list of places extracted with an NER program such as Spacy.
I heavely recommend using it that way.
It is also possible to use it on a text, but the precision is not very good, as some words in a language correspond to a place in another language.
It is also still a work in progress. I did a version of this library in an old internship, to quickly identify and classify documents according to countries, and thought it was a cool tool to share, so I remade it from scratch at home recently (with permission of my old boss).
It is an easy way to identify the source country of an news article for example, and automatically tag the country.
Usage
With a list:
wg = WorldGuesser()
text = ["London", "Manchester", "UK", "BRISTOL", "Scotland", "Berlin"]
result = wg.from_list(text)
self.assertEqual(result[0], "United Kingdom")
With a name:
wg = WorldGuesser()
text = "санкт-петербург"
result = wg.from_place(text)
self.assertEqual(result[0], "Russia")
If no country is found, the first result in the list will be "Unknown"
Data Sources
The date sources come from the GeoNames Database: https://www.geonames.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for worldguess-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b79f6972753754a02373ecb19856737580ebec7daec395319708aa63629d634 |
|
MD5 | 8bddcfb30b414797e0ba3dbeb3479600 |
|
BLAKE2b-256 | ee78075d3bdca64626485eba35a354413caf27cf5d5e2cf62ab19ad6ab623a97 |