Pure Python spell checker, utilizing Spylls a port of Hunspell
Project description
Phunspell
A pure Python spell checker utilizing spylls a port of Hunspell.
NOTE: If you are only supporting languages: English, Russian or Swedish then use spylls directly: (pip install spylls
)
This library includes dictionaries for all languages supported by LibreOffice.
Just a note giving credit where it's due, spylls is a fantastic project which deserves all the credit. There is a corresponding blog entry which is a good read. (and of course Hunspell itself)
Usage
import phunspell
pspell = phunspell.Phunspell('en_US')
print(pspell.lookup("phunspell")) # False
print(pspell.lookup("about")) # True
mispelled = pspell.lookup_list("Bill's TV is borken".split(" "))
print(mispelled) # ["borken"]
for suggestion in pspell.suggest('phunspell'):
print(suggestion) # Hunspell
Installation
pip install phunspell
Supported Languages
Language | Language Code |
---|---|
Afrikaans | af_ZA |
Aragonese | an_ES |
Arabic | ar |
Belarusian | be_BY |
Bulgarian | bg_BG |
Breton | br_FR |
Catalan | ca_ES |
Czech | cs_CZ |
Danish | da_DK |
German | de_AT |
German | de_CH |
German | de_DE |
Greek | el_GR |
English (Australian) | en_AU |
English (Canada) | en_CA |
English (Great Britain) | en_GB |
English (US) | en_US |
English (South African) | en_ZA |
Spanish (all variants) | es |
Spanish | es_AR |
Spanish | es_BO |
Spanish | es_CL |
Spanish | es_CO |
Spanish | es_CR |
Spanish | es_CU |
Spanish | es_DO |
Spanish | es_EC |
Spanish | es_ES |
Spanish | es_GQ |
Spanish | es_GT |
Spanish | es_HN |
Spanish | es_MX |
Spanish | es_NI |
Spanish | es_PA |
Spanish | es_PE |
Spanish | es_PH |
Spanish | es_PR |
Spanish | es_PY |
Spanish | es_SV |
Spanish | es_US |
Spanish | es_UY |
Spanish | es_VE |
Estonian | et_EE |
French | fr_FR |
Scottish Gaelic | gd_GB |
Gujarati | gu_IN |
Guarani | gug_PY |
Hebrew | he_IL |
Hindi | hi_IN |
Croatian | hr_HR |
Hungarian | hu_HU (TODO) |
Icelandic | is |
Indonesian | id_ID |
Italian | it_IT |
Kurdish (Turkey) | ku_TR |
Lithuanian | lt_LT |
Latvian | lv_LV |
Mapudüngun | md (arn) (TODO) |
Netherlands | nl_NL |
Norwegian | nb_NO |
Norwegian | nn_NO |
Occitan | oc_FR |
Polish | pl_PL |
Brazilian Portuguese | pt_BR |
Portuguese | pt_PT |
Romanian | ro_RO |
Sinhala | si_LK |
Slovak | sk_SK |
Slovenian | sl_SI |
Serbian (Cyrillic) | sr |
Serbian (Latin) | sr-Latn |
Swedish | sv_SE |
Swahili | sw_TZ |
Tamil | Ta (TODO) |
Thai | th_TH |
Turkish | tr_TR |
Ukrainian | uk_UA |
Vietnamese | vi_VN |
Tests
python -m unittest discover -s phunspell/tests -p "test_*.py"
Experimental
# Extended Optional:
# First time usage:
# create a directory of dictionaries stored as object
# makes loading/access much faster
storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
# run once only:
pspell_object_create = PhunspellObjectStore(path=storage_path)
# Then, typical usage:
pspell = Phunspell(object_storage=storage_path)
dicts_words = {
"an_ES": "vengar",
"be_BY": "ідалапаклонніцкі",
"bg_BG": "удържехме",
}
for loc in dicts_words.keys():
print(pspell.lookup(dicts_words[loc], locs=loc))
There is an option to build/store all the dictionaries as pickled data. Since there are security risks associated with pickled data we will not include that data in the distrubution.
To create your own local pickled dictionaries:
enter a python shell:
$ python
storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
pspell = PhunspellObjectStore(path=storage_path)
NOTE: You only have to do this once before using the library and it's optional (this will consume a lot of resources!)
Once completed you should have a picked object for every dictionary supported by this lib.
$ ls /home/dwright/python/phunspell/pickled_data/
af_ZA
an_ES
be_BY
bg_BG
bn_BD
br_FR
bs_BA
cs_CZ
da_DK
de_AT
de_CH
...
...
...
NOTE: will take up almost 2 GB of space
$ du -sh .
1.4G
For all future uses of the library just pass the directory as an argument.
storage_path = "/home/dvwright/data/phunspell/dictionary_objects"
pspell = Phunspell(object_storage=storage_path)
# load the specific locale on lookups
pspell.lookup_list(['us-word1', 'us-word2'], locs='en_US')
pspell.lookup('german-word', locs='de_DE')
NOTE: If you ever update dictionary data, you will need to create a new pickle store for it.
and it should find the dictionaries and load them quickly
Misc
python
, python3
, hunspell
, libreoffice
, spell
, spell checking
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file phunspell-0.1.6.tar.gz
.
File metadata
- Download URL: phunspell-0.1.6.tar.gz
- Upload date:
- Size: 47.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9747767ed69e0b8d662df3268965ff5ded7f1aac7220b897a10086331f308c9 |
|
MD5 | 5c3f3e9e1df8e00b0895404a671214ab |
|
BLAKE2b-256 | 7f1e7e36e7cdcf160e1e3484d1edb5d5beff26c1967ffb4739672840be85d646 |