Skip to main content

Finds countries in a string

Project description

Country named entity recognition

Developed by Fast Data Science, https://fastdatascience.com

Source code at https://github.com/fastdatascience/country_named_entity_recognition

Python library for finding country names in a string.

Please note this library finds only high confidence countries. A text such as “America” is ambiguous.

It also only finds the English names of these countries. Names in the local language are not supported.

Requirements

Python 3.9 and above

pycountry 22.1.10

Installation

pip install country-named-entity-recognition

Usage examples

Example 1

from country_named_entity_recognition import find_countries
find_countries("We are expanding in the UK")

outputs a list of tuples.

[(Country(alpha_2='GB', alpha_3='GBR', flag='🇬🇧', name='United Kingdom', numeric='826', official_name='United Kingdom of Great Britain and Northern Ireland'),
  <re.Match object; span=(1, 15), match='united kingdom'>)]

Example 2

The tool’s default behaviour assumes countries are correctly capitalised and punctuated:

from country_named_entity_recognition import find_countries
find_countries("I want to visit france.")

will not return anything.

However, if your text comes from social media or another non-moderated source, you might want to allow case-insensitive matching:

from country_named_entity_recognition import find_countries
find_countries("I want to visit france.", is_ignore_case=True)

Example 3

This illustrates how you can bring context into the tool. If we encounter the string “Georgia”, by default it refers to the US state.

from country_named_entity_recognition import find_countries
find_countries("Gladys Knight and the Pips wrote the Midnight Train to Georgia")

will return an empty list.

But what happens if we include a clear contextual clue?

from country_named_entity_recognition import find_countries
find_countries("Salome Zourabichvili is the current president of Georgia.")

returns

[(Country(alpha_2='GE', alpha_3='GEO', flag='🇬🇪', name='Georgia', numeric='268'), <re.Match object; span=(34, 41), match='Georgia'>)]

You can force the latter behaviour:

from country_named_entity_recognition import find_countries
find_countries("I want to visit Georgia.", is_georgia_probably_the_country=True)

Adding custom variants

If you find that a variant country name is missing, you can add it using the add_custom_variants method.

Let’s imagine we want to add Neverneverland as a synonym for the UAE:

from country_named_entity_recognition import find_countries, add_custom_variants
add_custom_variants(["Neverneverland"], "AE")
find_countries("I want to visit Neverneverland")

Raising issues

If you find a problem, you are welcome either to raise an issue at https://github.com/fastdatascience/country_named_entity_recognition/issues or to make a pull request and I will merge it into the project.

Who to contact

Thomas Wood at https://fastdatascience.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

File details

Details for the file country_named_entity_recognition-0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for country_named_entity_recognition-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa0db624598d9629fb3eb0b22d9024ca3d20d8cac7dcf774d46f02054e5781a
MD5 48dd5d45a0c2ed29fad17c69f27d8939
BLAKE2b-256 1c983de3ebee76b09c56e15e60588391cc298e257fba22f3e7cd2734f46e046a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page