Extract and count countries and cities (+their synonyms) from text
Project description
flashgeotext :zap::earth_africa:
Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.
documentation: https://flashgeotext.iwpnd.pw/
introductory blogpost: https://iwpnd.pw/articles/2020-02/flashgeotext-library
Usage
from flashgeotext.geotext import GeoText
geotext = GeoText()
input_text = '''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans
to cut tariffs on $75 billion worth of goods that the country
imports from the US. Washington welcomes the decision.'''
geotext.extract(input_text=input_text)
>> {
'cities': {
'Shanghai': {
'count': 2,
'span_info': [(0, 8), (45, 53)],
'found_as': ['Shanghai', 'Shanghai'],
},
'Washington, D.C.': {
'count': 1,
'span_info': [(175, 185)],
'found_as': ['Washington'],
}
},
'countries': {
'China': {
'count': 1,
'span_info': [(64, 69)],
'found_as': ['China'],
},
'United States': {
'count': 1,
'span_info': [(171, 173)],
'found_as': ['US'],
}
}
}
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Installing
pip:
pip install flashgeotext
conda:
conda install flashgeotext
for development:
git clone https://github.com/iwpnd/flashgeotext.git
cd flashgeotext/
poetry install
Running the tests
poetry run pytest . -v
Authors
- Benjamin Ramser - Initial work - iwpnd
See also the list of contributors who participated in this project.
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Demo Data cities from http://www.geonames.org licensed under the Creative Commons Attribution 3.0 License.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for flashgeotext-0.4.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7bb814e00a894d00ef9772d2954833556986d1d7aca34495b6c1c8a13d96b45 |
|
MD5 | eae66fd1d30d11263674831b8c7d1a26 |
|
BLAKE2b-256 | 30dc6708eb30e40f71ed47a7e32911da9c111762f0113a336d5b7c115fcddc9a |