Skip to main content

Dictionary-based spell checking with Unicode-aware tokenization and light normalization. Supports 62 languages via compressed Marisa-Trie dictionaries and returns a compact report of misspellings.

Project description

WizardSpell Banner


WizardSpell

PyPI - Version PyPI - Downloads/month License

WizardSpell is a Python library for Dictionary-based spell checking with Unicode-aware tokenization and light text normalization. Supports 62 languages via compressed Marisa-Trie dictionaries. Returns a compact report with the total number of misspellings and the list of offending tokens.


Contents


Installation

Requires Python 3.9+.

pip install wizardspell

Quick start

import wizardspell as ws

res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)

Spell Checking

Behavior

  • Normalizes common Unicode quirks (e.g., smart quotes, zero-width joiners).
  • Ignores numbers and leading/trailing punctuation when deciding correctness.
  • Treats ' / variants as equivalent.
  • Looks up each token against the selected language dictionary.

Parameters

Parameter Description
text (str) Raw input text.
language (str, default "en") ISO-639 code.
dict_dir (str | Path | None) Directory containing one or more *.marisa.zst (or decompressed *.marisa) dictionaries. If None: uses a per-user cache directory and auto-downloads the required dictionary if missing.
use_mmap (bool, default False) True → memory-map the on-disk .marisa file (lowest RAM; fastest startup). False → load the entire trie into RAM (higher RAM; highest steady-state throughput).

Return value

dict with:

  • errors_countint total misspellings
  • errorslist[str] of misspelled tokens (normalized/case-folded)
import wizardspell as ws

check = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(check)

Output

{"errors_count": 2, "errors": ["thiss", "sentense"]}

Examples

Basic

import wizardspell as ws

res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)

Output

{"errors_count": 2, "errors": ["thiss", "sentense"]}

Italian example

import wizardspell as ws
print(ws.spell_checking("Queso è un tes , di preva.", language="it"))

Output

{"errors_count": 3, "errors": ["queso", "tes", "preva."]}

Custom dictionary directory & mmap

import wizardspell as ws
from pathlib import Path

res = ws.spell_checking(
    "Coloar centre thetre",
    language="en",
    dict_dir=Path("~/WizardSpell_dicts"),
    use_mmap=True,
)
print(res)

Output

{"errors_count": 2, "errors": ["coloar", "thetre"]}

Operational notes

  • Cache location (when dict_dir=None): a per-user data directory is used. You can override it via the first existing of: WIZARDSPELL_DATA_DIR / WIZARDSPELL_DICT_DIR / WIZARDSPELL_HOME (environment variables).
  • Auto-download: when a dictionary is missing and dict_dir is not set, WizardSpell downloads the compressed *.marisa.zst once and reuses it subsequently.
  • File formats:
    • *.marisa.zst files are decompressed on the fly (into memory) or to an adjacent *.marisa file when use_mmap=True.
    • If you already have an uncompressed *.marisa file in dict_dir, it is used directly.
  • Performance:
    • use_mmap=True → minimal RAM, fastest startup; excellent for large dictionaries or constrained environments.
    • use_mmap=False → maximal throughput once loaded; best when RAM is plentiful.
  • Chinese requires jieba; all other languages work out-of-the-box.
  • Output tokens in errors are normalized/case-folded; they may differ in casing from the original text.

Available dictionaries

Code Language Code Language
af Afrikaans an Aragonese
ar Arabic as Assamese
be Belarusian bg Bulgarian
bn Bengali bo Tibetan
br Breton bs Bosnian
ca Catalan cs Czech
da Danish de German
el Greek en English
eo Esperanto es Spanish
fa Persian fr French
gd Scottish Gaelic gn Guarani
gu Gujarati (gu_IN) he Hebrew
hi Hindi hr Croatian
id Indonesian is Icelandic
it Italian ja Japanese
kmr Kurmanji Kurdish kn Kannada
ku Central Kurdish lo Lao
lt Lithuanian lv Latvian
mr Marathi nb Norwegian Bokmål
ne Nepali nl Dutch
nn Norwegian Nynorsk oc Occitan
or Odia pa Punjabi
pl Polish pt Portuguese (EU)
ro Romanian ru Russian
sa Sanskrit si Sinhala
sk Slovak sl Slovenian
sq Albanian sr Serbian
sv Swedish sw Swahili
ta Tamil te Telugu
th Thai tr Turkish
uk Ukrainian vi Vietnamese

License

AGPL-3.0-or-later

Resources

Contact & Author

Author: Mattia Rubino
Email: textwizard.dev@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wizardspell-1.0.0.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wizardspell-1.0.0-py3-none-any.whl (47.9 kB view details)

Uploaded Python 3

File details

Details for the file wizardspell-1.0.0.tar.gz.

File metadata

  • Download URL: wizardspell-1.0.0.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardspell-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b15725b86b9c1b0ad7c45f5eb42bbaafcbace20b310e68a38444fbec3070459a
MD5 41ed262ed186f956232677e38234ee58
BLAKE2b-256 e52bd54986f265aa761af69afec5d54a991d747e33296ed06369db69ecf9db50

See more details on using hashes here.

File details

Details for the file wizardspell-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: wizardspell-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 47.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardspell-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4878cd6732287de839da10f1fa1a87c7057d18cccba20a5c9b0cc177ab285a20
MD5 c0300ebd755231e593778b6d2a370cb8
BLAKE2b-256 e4b0521e90ef2eb951047a35e64f6108a4347488dd35d786ea6d619e1840376d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page