Extract and count countries and cities (+their synonyms) from text
Project description
Toponym
Build grammatical cases for words in Slavic languages from pre-defined recipes.
documentation: https://toponym.iwpnd.pw/
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Installing
for usage:
pip install toponym
for development:
git clone https://github.com/iwpnd/toponym.git
pip install flit
flit install toponym --symlink
Description
Problem
In Slavic languages a word can change, depending on how and where it is used within a sentence. The city Moscow (Москва
) changes to Москве
when used prepositional.
So when you want to eg. know if:
"Москва" in "В Москве с начала года отремонтировали 3 тысячи подъездов"
>> False
Solution
This is where Toponym comes in. Utilizing pre-defined recipes it naively creates grammatical cases depending on the ending of the input word that the user wants to create Toponyms from. The recipe looks as follows:
Recipe
recipe = {
"а": { # ending of the input-word
"nominative": [[""], 0],
"genitive": [ # case that we need
["ы","и"], # ending of the output-word
1 # chars to be deleted, before ending of output is added
],
"dative": [["е"], 1],
"accusative": [["у"], 1],
"instrumental": [...]
}
If multiple endings are given, multiple toponyms with that ending will be created. Some of those created toponyms do not make sense, or are not used in the wild. If you have an idea about how to remove those that are unreal please contact me.
With the built toponyms for you can now check:
from toponym.recipes import Recipes
from toponym.toponym import Toponym
recipes_russian = Recipes()
recipes_russian.load_from_language(language='russian')
city = "Москва"
t = Toponym(input_word=city, recipes=recipes_russian)
t.build()
print(t.list_toponyms())
>> ['Москвой', 'Москвы', 'Москви', 'Москве', 'Москву', 'Москва']
any([word in "В Москве с начала года отремонтировали 3 тысячи подъездов" for word in tn.list_toponyms()])
>> True
supported languages:
full name iso code
croatian hr
russian ru
ukrainian uk
romanian ro
latvian lv
hungarian hu
greek el
polish pl
Running the tests
pytest toponym/tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file toponym-0.5.1.tar.gz
.
File metadata
- Download URL: toponym-0.5.1.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.23.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb2bb6e18b6c4c7621b37d109632dfd67523a916b87b9dbe6d83762dc9870d20 |
|
MD5 | 74d80dd64801fbdeb892552d78bb8488 |
|
BLAKE2b-256 | a07f1a93e7de33a296c3d079003f95ad0545b98286c6fae564f3789ae4929f3a |
File details
Details for the file toponym-0.5.1-py3-none-any.whl
.
File metadata
- Download URL: toponym-0.5.1-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.23.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 880cd2de807d976c38e5be7811a92020f7bca5aed053c6aa244fb0e506c18e16 |
|
MD5 | 4326c5a8157bbc3e7a6e5fd7d904008a |
|
BLAKE2b-256 | faedc25407d77f461ad1e59ca0d18f83e2e4bada2f33fe47b0195603772a8afd |