A wrapper on hunspell for use in Python
Project description
CyHunspell
Cython wrapper on Hunspell Dictionary
Description
This repository provides a wrapper on Hunspell to be used natively in Python. The module uses cython to link between the C++ and Python code, with some additional features. There's very little Python overhead as all the heavy lifting is done on the C++ side of the module interface, which gives optimal performance.
The hunspell library will cache any corrections, you can use persistent caching by
adding the use_disk_cache
argument to a Hunspell constructor. Otherwise it uses
in-memory caching.
Installing
For the simplest install simply run:
pip install cyhunspell
This will install the hunspell 1.7.0 C++ bindings on your behalf for your platform.
Dependencies
cacheman -- for (optionally asynchronous) persistent caching
Non-Python Dependencies
hunspell
The library installs hunspell version 1.7.0. As new version of hunspell become available this library will provide new versions to match.
Features
Spell checking & spell suggestions
How to use
Below are some simple examples for how to use the repository.
Creating a Hunspell object
from hunspell import Hunspell
h = Hunspell()
You now have a usable hunspell object that can make basic queries for you.
h.spell('test') # True
Spelling
It's a simple task to ask if a particular word is in the dictionary.
h.spell('correct') # True
h.spell('incorect') # False
This will only ever return True or False, and won't give suggestions about why it might be wrong. It also depends on your choice of dictionary.
Suggestions
If you want to get a suggestion from Hunspell, it can provide a corrected label given a basestring input.
h.suggest('incorect') # ('incorrect', 'correction', corrector', 'correct', 'injector')
The suggestions are in sorted order, where the lower the index the closer to the input string.
Suffix Match
h.suffix_suggest('do') # ('doing', 'doth', 'doer', 'doings', 'doers', 'doest')
Stemming
The module can also stem words, providing the stems for pluralization and other inflections.
h.stem('testers') # ('tester', 'test')
h.stem('saves') # ('save',)
Analyze
Like stemming but return morphological analysis of the input instead.
h.analyze('permanently') # (' st:permanent fl:Y',)
Generate
Generate methods are NOT provided at this time due to the 1.7.0 build not producing any results for any inputs, included the documented one. If this is fixed or someone identifies the issue in the call pattern this will be added to the library in the future.
Bulk Requests
You can also request bulk actions against Hunspell. This will trigger a threaded (without a gil) request to perform the action requested. Currently just 'suggest' and 'stem' are bulk requestable.
h.bulk_suggest(['correct', 'incorect'])
# {'incorect': ('incorrect', 'correction', 'corrector', 'correct', 'injector'), 'correct': ('correct',)}
h.bulk_suffix_suggest(['cat', 'do'])
# {'do': ('doing', 'doth', 'doer', 'doings', 'doers', 'doest'), 'cat': ('cater', 'cats', "cat's", 'caters')}
h.bulk_stem(['stems', 'currencies'])
# {'currencies': ('currency',), 'stems': ('stem',)}
h.bulk_analyze(['dog', 'permanently'])
# {'permanently': (' st:permanent fl:Y',), 'dog': (' st:dog',)}
By default it spawns number of CPUs threads to perform the operation. You can overwrite the concurrency as well.
h.set_concurrency(4) # Four threads will now be used for bulk requests
Dictionaries
You can also specify the language or dictionary you wish to use.
h = Hunspell('en_CA') # Canadian English
By default you have the following dictionaries available
- en_AU
- en_CA
- en_GB
- en_NZ
- en_US
- en_ZA
However you can download your own and point Hunspell to your custom dictionaries.
h = Hunspell('en_GB-large', hunspell_data_dir='/custom/dicts/dir')
Adding Dictionaries
You can also add new dictionaries at runtime by calling the add_dic method.
h.add_dic(os.path.join(PATH_TO, 'special.dic'))
Adding words
You can add individual words to a dictionary at runtime.
h.add('sillly')
Furthermore you can attach an affix to the word when doing this by providing a second argument
h.add('silllies', "is:plural")
Removing words
Much like adding, you can remove words.
h.remove(word)
Asynchronous Caching
If you want to have Hunspell cache suggestions and stems you can pass it a directory to house such caches.
h = Hunspell(disk_cache_dir='/tmp/hunspell/cache/dir')
This will save all suggestion and stem requests periodically and in the background. The cache will fork after a number of new requests over particular time ranges and save the cache contents while the rest of the program continues onward. Yo'll never have to explicitly save your caches to disk, but you can if you so choose.
h.save_cache()
Otherwise the Hunspell object will cache such requests locally in memory and not persist that memory.
Language Preferences
- Google Style Guide
- Object Oriented (with a few exceptions)
Known Workarounds
- On Windows very long file paths, or paths saved in a different encoding than the system require special handling by Hunspell to load dictionary files. To circumvent this on Windows setups, either set
system_encoding='UTF-8'
in theHunspell
constructor or set the environment variableHUNSPELL_PATH_ENCODING=UTF-8
. Then you must re-encode yourhunspell_data_dir
in UTF-8 by passing that argument name to theHunspell
constructor or setting theHUNSPELL_DATA
environment variable. This is a restriction of Hunspell / Windows operations.
Author
Author(s): Tim Rodriguez and Matthew Seal
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cyhunspell-2.0.0a0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60b87117c1691c3e0134d297552cd114c516363981755bcd393037f62d38b94d |
|
MD5 | 90ba118af8f902b25b5fc9aeaa866b19 |
|
BLAKE2b-256 | d633f29306f5f65edcd97dbb21caae28991e645bea9638b4169ced80d7b2a830 |
Hashes for cyhunspell-2.0.0a0-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49a32cc093b47c6f6883a30bfb22cdfa1382fde643fb9563d2b63ab713630b1b |
|
MD5 | 3e7472d5cfb1d6ae634849e4debfcb44 |
|
BLAKE2b-256 | 674a1bc937cae89ca6435a68e8c29ccc2dde1a5a6f968a0dba6eca15bfaa30fb |
Hashes for cyhunspell-2.0.0a0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fc06a4911b05c99390ab29b9c6a5364d9daed7e2535ba8d5fcceb17a08b71e8 |
|
MD5 | 1578aa94ffdaba5c59f50252b304819e |
|
BLAKE2b-256 | 9f72d10d59239e51720afe98da4890f73173d32e726adff4b576d56971a2a10b |
Hashes for cyhunspell-2.0.0a0-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6a1c08b8b5307590b0a052a3183522c897e33d52fccbfc5f588f3daa2ef816f |
|
MD5 | 52449b93b55a6036f3f66396bc712bed |
|
BLAKE2b-256 | d23205045801b415f553d2df4493bbdcba79266d13a94c18081f82f5e5636ce6 |
Hashes for cyhunspell-2.0.0a0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9be1a8b7c5ba2657515c6dfe54ddfe225d87f370cfdd936e46f87ec1cb58231e |
|
MD5 | ca40c5e1452dec56f424a9caa0d41005 |
|
BLAKE2b-256 | 2bdf3dd35a3a40d9ad03dd3d834451c4f6f2f77f048436fdabf14dec98d613d0 |
Hashes for cyhunspell-2.0.0a0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 659f3f8e8202f76b953d6b6b8aa2ede55d8ade27449198aff4f2f718d453a033 |
|
MD5 | c22859e22433fec8ea4fcc1564dd51f6 |
|
BLAKE2b-256 | 9e3762cbc08a255575ba8c00db6ce228b4468761e053d294e53c198e921b86a8 |
Hashes for cyhunspell-2.0.0a0-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d94764a5b79555baa8268d1022f06ffaa05e9c072a7841225ac04046b85cd1a6 |
|
MD5 | 17e0f24047bddf678b1e990fce925a17 |
|
BLAKE2b-256 | cff2b284c61cd662de8c98c105a98454097aa4bf4d2225c7a54c1bc37cff76cb |
Hashes for cyhunspell-2.0.0a0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41aec110e035629ebda410453ebd98a375b0346af11d1ba25f2d13f2eaa50677 |
|
MD5 | 19f86117eb699a2859a0feb90ea50e52 |
|
BLAKE2b-256 | d16c216d6cc4167117da1ac44059cc1ff1a5308832d3717a8dc7ae72e6190090 |
Hashes for cyhunspell-2.0.0a0-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9660649e6fd480413f0b1d43dabcabb69285679f515a9c13b16b8dc2f517920 |
|
MD5 | c5ddd2afcd8492caba7110678589b660 |
|
BLAKE2b-256 | d90e2bc4115aa80f1f08f690810d2c4deb3bead754cf16826a848f79dcadc883 |
Hashes for cyhunspell-2.0.0a0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d3b02e8b72ac74d1ddfef1f31005cc2840c3d86628ec5c7d0f6b90e7c719c50 |
|
MD5 | fdd5c7138e575b0cd2baa5e4a8e28fc4 |
|
BLAKE2b-256 | 25b9d6e2e85f90f614de3320df157a1cd8ad21599d80794c87477c8b5a5e494a |
Hashes for cyhunspell-2.0.0a0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbfa0bd90f111bd86a43d84a8b154c24dd7a52967af6ae21bd5a92332ba604c5 |
|
MD5 | 834004b982b1937cfa9b30de1120ef25 |
|
BLAKE2b-256 | 922cd568ff6f6fed372596cdd0944ee63b17a81938ae7b88c6d284dcaf53c87f |
Hashes for cyhunspell-2.0.0a0-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 036a2764131b2572975822df8c0668317198ed51dc4cfd3e87b62113e8aa3f27 |
|
MD5 | 5ced1cd39fbdfef16791773f893f0177 |
|
BLAKE2b-256 | 9a30c68bf61a65b037ecd8c13507a939aaaddeb2637e18a685f8d63c6196d707 |
Hashes for cyhunspell-2.0.0a0-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d638fdc2b1510fe9f49251cf1a15db5e8ae360af35f7565eae3618273dc6ec2 |
|
MD5 | 80677528462ca725571bf1cfe432784e |
|
BLAKE2b-256 | a100ca8d252cea264d816a252c4750cd90eb53487e623ad9b5663e8b1d12bc6d |
Hashes for cyhunspell-2.0.0a0-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6c0ab05fb704bae2304f18daae7407a9ff8aca5b7548800dc26c5fc124e0a7d |
|
MD5 | 69ac3ba5d88d8b32d6f99d8449941a72 |
|
BLAKE2b-256 | 6b0d5fea445c62affd3a87f63be9a5660085cb6ca039b131bcdc775af29f5040 |
Hashes for cyhunspell-2.0.0a0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28d3facd50eb8baa8ee125f68949c9015d6f8a3aa933c8ce357e3121eabc8df1 |
|
MD5 | beab781b374e0b62ea517eacc6e1ce76 |
|
BLAKE2b-256 | 9e9cfa57d41059db1dd48b3d126ff7f9c619743c5c14d7eecdec870036ccc258 |
Hashes for cyhunspell-2.0.0a0-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 647b7d2bf8a7f27fdc3a90deefbf93ea0e0a72c73695a9b34dfceb9731f05d2d |
|
MD5 | 7792b81ec88e6aad5930b04b4cae35bd |
|
BLAKE2b-256 | 22ee9181ed03f1d3c8a71bb5ce64767fb3ddcb1d0d40544fc911a6c5e2f3cfcc |
Hashes for cyhunspell-2.0.0a0-cp35-cp35m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | beb23f8abda38404f86e6d4c600013f1b740506e19ecc01f0bd4f75e9cce4232 |
|
MD5 | 92388fecc2313f79f15b7191a2141ef9 |
|
BLAKE2b-256 | 16bde6d1d75405d7e24082a6831d85240b5deb57fe3909358c58ba677464c1ac |
Hashes for cyhunspell-2.0.0a0-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 040fc6388cffd38e85e46143f574e638753b1e7d0b2f51d1d6cddf997701a05a |
|
MD5 | 1d8f7010b74616ba0ca4ef4075a4be75 |
|
BLAKE2b-256 | 559b13f398c41599c5a5c0f4a55a600b7ac87b03e37a059ba8becf6cf3c194ec |