Skip to main content

Pure Python spell checker, utilizing Spylls a port of Hunspell

Project description

Phunspell

A pure Python spell checker utilizing spylls a port of Hunspell.

NOTE: If you are only supporting languages: English, Russian or Swedish then use spylls directly: (pip install spylls)

This library includes dictionaries for all languages supported by LibreOffice.

Just a note giving credit where it's due, spylls is a fantastic project which deserves all the credit. There is a corresponding blog entry which is a good read. (and of course Hunspell itself)

Usage

import phunspell

pspell = phunspell.Phunspell('en_US')
print(pspell.lookup("phunspell")) # False
print(pspell.lookup("about")) # True

mispelled = pspell.lookup_list("Bill's TV is borken".split(" "))
print(mispelled) # ["borken"]

for suggestion in pspell.suggest('phunspell'):
    print(suggestion) # Hunspell

Installation

pip install phunspell

Supported Languages

Language Language Code
Afrikaans af_ZA
Aragonese an_ES
Arabic ar
Belarusian be_BY
Bulgarian bg_BG
Breton br_FR
Catalan ca_ES
Czech cs_CZ
Danish da_DK
German de_AT
German de_CH
German de_DE
Greek el_GR
English (Australian) en_AU
English (Canada) en_CA
English (Great Britain) en_GB
English (US) en_US
English (South African) en_ZA
Spanish (all variants) es
Spanish es_AR
Spanish es_BO
Spanish es_CL
Spanish es_CO
Spanish es_CR
Spanish es_CU
Spanish es_DO
Spanish es_EC
Spanish es_ES
Spanish es_GQ
Spanish es_GT
Spanish es_HN
Spanish es_MX
Spanish es_NI
Spanish es_PA
Spanish es_PE
Spanish es_PH
Spanish es_PR
Spanish es_PY
Spanish es_SV
Spanish es_US
Spanish es_UY
Spanish es_VE
Estonian et_EE
French fr_FR
Scottish Gaelic gd_GB
Gujarati gu_IN
Guarani gug_PY
Hebrew he_IL
Hindi hi_IN
Croatian hr_HR
Hungarian hu_HU
Icelandic is
Indonesian id_ID
Italian it_IT
Kurdish (Turkey) ku_TR
Lithuanian lt_LT
Latvian lv_LV
Mapudüngun md (arn) (TODO)
Netherlands nl_NL
Norwegian nb_NO
Norwegian nn_NO
Occitan oc_FR
Polish pl_PL
Brazilian Portuguese pt_BR
Portuguese pt_PT
Romanian ro_RO
Sinhala si_LK
Slovak sk_SK
Slovenian sl_SI
Serbian (Cyrillic) sr
Serbian (Latin) sr-Latn
Swedish sv_SE
Swahili sw_TZ
Tamil Ta (TODO)
Thai th_TH
Turkish tr_TR
Ukrainian uk_UA
Vietnamese vi_VN

Tests

python -m unittest discover -s phunspell/tests -p "test_*.py"

Experimental

There is an option to build/store all the dictionaries as pickled data. Since there are security risks associated with pickled data we will not include that data in the distrubution.

To create your own local pickled dictionaries set an env variable.

linux/mac osx:

$ export PICKLED_DATADIR="/home/dwright/python/phunspell/pickled_data/"

enter a python shell:

$ python
>>> Phunspell(loc_lang="en_US", load_all=True)

NOTE: this will consume a lot of resources!

Once completed you should have a picked object for every dictionary supported by this lib.

$ ls /home/dwright/python/phunspell/pickled_data/
af_ZA
an_ES
be_BY
bg_BG
bn_BD
br_FR
bs_BA
cs_CZ
da_DK
de_AT
de_CH
...
...
...

NOTE: will take up almost 2 GB of space

$ du -sh .
1.4G

As long as you keep that environmental variable set for all future runs just use the library as:

pspell = Phunspell()

NOTE: If you ever update dictionary data, you will need to create a new pickle store for it.

and it should find the dictionaries and load them quickly

Misc

python, python3, hunspell, libreoffice, spell, spell checking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phunspell-0.1.4.tar.gz (47.5 MB view details)

Uploaded Source

File details

Details for the file phunspell-0.1.4.tar.gz.

File metadata

  • Download URL: phunspell-0.1.4.tar.gz
  • Upload date:
  • Size: 47.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.2

File hashes

Hashes for phunspell-0.1.4.tar.gz
Algorithm Hash digest
SHA256 dbda2defb027968e101e51f6365fa25f18167409473da03eda0f6cee57db927d
MD5 f85f4ec275c25871044cda9cdf47f090
BLAKE2b-256 04d68936aff45ff198883c538b3a644fa08dfbb308a92ae3ee85253ef27466f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page