Dictionary-based spell checking with Unicode-aware tokenization and light normalization. Supports 62 languages via compressed Marisa-Trie dictionaries and returns a compact report of misspellings.
Project description
WizardSpell
WizardSpell is a Python library for Dictionary-based spell checking with Unicode-aware tokenization and light text normalization. Supports 62 languages via compressed Marisa-Trie dictionaries. Returns a compact report with the total number of misspellings and the list of offending tokens.
Contents
Installation
Requires Python 3.9+.
pip install wizardspell
Quick start
import wizardspell as ws
res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)
Spell Checking
Behavior
- Normalizes common Unicode quirks (e.g., smart quotes, zero-width joiners).
- Ignores numbers and leading/trailing punctuation when deciding correctness.
- Treats
'/’variants as equivalent. - Looks up each token against the selected language dictionary.
Parameters
| Parameter | Description |
|---|---|
text |
(str) Raw input text. |
language |
(str, default "en") ISO-639 code. |
dict_dir |
(str | Path | None) Directory containing one or more *.marisa.zst (or decompressed *.marisa) dictionaries. If None: uses a per-user cache directory and auto-downloads the required dictionary if missing. |
use_mmap |
(bool, default False) True → memory-map the on-disk .marisa file (lowest RAM; fastest startup). False → load the entire trie into RAM (higher RAM; highest steady-state throughput). |
Return value
dict with:
errors_count–inttotal misspellingserrors–list[str]of misspelled tokens (normalized/case-folded)
import wizardspell as ws
check = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(check)
Output
{"errors_count": 2, "errors": ["thiss", "sentense"]}
Examples
Basic
import wizardspell as ws
res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)
Output
{"errors_count": 2, "errors": ["thiss", "sentense"]}
Italian example
import wizardspell as ws
print(ws.spell_checking("Queso è un tes , di preva.", language="it"))
Output
{"errors_count": 3, "errors": ["queso", "tes", "preva."]}
Custom dictionary directory & mmap
import wizardspell as ws
from pathlib import Path
res = ws.spell_checking(
"Coloar centre thetre",
language="en",
dict_dir=Path("~/WizardSpell_dicts"),
use_mmap=True,
)
print(res)
Output
{"errors_count": 2, "errors": ["coloar", "thetre"]}
Operational notes
- Cache location (when
dict_dir=None): a per-user data directory is used. You can override it via the first existing of:WIZARDSPELL_DATA_DIR/WIZARDSPELL_DICT_DIR/WIZARDSPELL_HOME(environment variables). - Auto-download: when a dictionary is missing and
dict_diris not set, WizardSpell downloads the compressed*.marisa.zstonce and reuses it subsequently. - File formats:
*.marisa.zstfiles are decompressed on the fly (into memory) or to an adjacent*.marisafile whenuse_mmap=True.- If you already have an uncompressed
*.marisafile indict_dir, it is used directly.
- Performance:
use_mmap=True→ minimal RAM, fastest startup; excellent for large dictionaries or constrained environments.use_mmap=False→ maximal throughput once loaded; best when RAM is plentiful.
- Chinese requires
jieba; all other languages work out-of-the-box. - Output tokens in
errorsare normalized/case-folded; they may differ in casing from the original text.
Available dictionaries
| Code | Language | Code | Language |
|---|---|---|---|
af |
Afrikaans | an |
Aragonese |
ar |
Arabic | as |
Assamese |
be |
Belarusian | bg |
Bulgarian |
bn |
Bengali | bo |
Tibetan |
br |
Breton | bs |
Bosnian |
ca |
Catalan | cs |
Czech |
da |
Danish | de |
German |
el |
Greek | en |
English |
eo |
Esperanto | es |
Spanish |
fa |
Persian | fr |
French |
gd |
Scottish Gaelic | gn |
Guarani |
gu |
Gujarati (gu_IN) |
he |
Hebrew |
hi |
Hindi | hr |
Croatian |
id |
Indonesian | is |
Icelandic |
it |
Italian | ja |
Japanese |
kmr |
Kurmanji Kurdish | kn |
Kannada |
ku |
Central Kurdish | lo |
Lao |
lt |
Lithuanian | lv |
Latvian |
mr |
Marathi | nb |
Norwegian Bokmål |
ne |
Nepali | nl |
Dutch |
nn |
Norwegian Nynorsk | oc |
Occitan |
or |
Odia | pa |
Punjabi |
pl |
Polish | pt |
Portuguese (EU) |
ro |
Romanian | ru |
Russian |
sa |
Sanskrit | si |
Sinhala |
sk |
Slovak | sl |
Slovenian |
sq |
Albanian | sr |
Serbian |
sv |
Swedish | sw |
Swahili |
ta |
Tamil | te |
Telugu |
th |
Thai | tr |
Turkish |
uk |
Ukrainian | vi |
Vietnamese |
License
Resources
Contact & Author
Author: Mattia Rubino
Email: textwizard.dev@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wizardspell-1.0.0.tar.gz.
File metadata
- Download URL: wizardspell-1.0.0.tar.gz
- Upload date:
- Size: 59.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b15725b86b9c1b0ad7c45f5eb42bbaafcbace20b310e68a38444fbec3070459a
|
|
| MD5 |
41ed262ed186f956232677e38234ee58
|
|
| BLAKE2b-256 |
e52bd54986f265aa761af69afec5d54a991d747e33296ed06369db69ecf9db50
|
File details
Details for the file wizardspell-1.0.0-py3-none-any.whl.
File metadata
- Download URL: wizardspell-1.0.0-py3-none-any.whl
- Upload date:
- Size: 47.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4878cd6732287de839da10f1fa1a87c7057d18cccba20a5c9b0cc177ab285a20
|
|
| MD5 |
c0300ebd755231e593778b6d2a370cb8
|
|
| BLAKE2b-256 |
e4b0521e90ef2eb951047a35e64f6108a4347488dd35d786ea6d619e1840376d
|