Dictionary-based spell checking with Unicode-aware tokenization and light normalization. Supports 62 languages via compressed Marisa-Trie dictionaries and returns a compact report of misspellings.

These details have not been verified by PyPI

Project links

Project description

WizardSpell Banner

WizardSpell

WizardSpell is a Python library for Dictionary-based spell checking with Unicode-aware tokenization and light text normalization. Supports 62 languages via compressed Marisa-Trie dictionaries. Returns a compact report with the total number of misspellings and the list of offending tokens.

Installation
Quick start
Spell Checking
License
Resources
Contact & Author

Installation

Requires Python 3.9+.

pip install wizardspell

Quick start

import wizardspell as ws

res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)

Spell Checking

Behavior

Normalizes common Unicode quirks (e.g., smart quotes, zero-width joiners).
Ignores numbers and leading/trailing punctuation when deciding correctness.
Treats ' / ’ variants as equivalent.
Looks up each token against the selected language dictionary.

Parameters

Parameter	Description
`text`	(str) Raw input text.
`language`	(str, default `"en"`) ISO-639 code.
`dict_dir`	(str \| Path \| None) Directory containing one or more `.marisa.zst` (or decompressed `.marisa`) dictionaries. If `None`: uses a per-user cache directory and auto-downloads the required dictionary if missing.
`use_mmap`	(bool, default `False`) True → memory-map the on-disk `.marisa` file (lowest RAM; fastest startup). False → load the entire trie into RAM (higher RAM; highest steady-state throughput).

Return value

dict with:

errors_count – int total misspellings
errors – list[str] of misspelled tokens (normalized/case-folded)

import wizardspell as ws

check = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(check)

Output

{"errors_count": 2, "errors": ["thiss", "sentense"]}

Examples

Basic

import wizardspell as ws

res = ws.spell_checking("Thiss sentense has a typo.", language="en")
print(res)

Output

{"errors_count": 2, "errors": ["thiss", "sentense"]}

Italian example

import wizardspell as ws
print(ws.spell_checking("Queso è un tes , di preva.", language="it"))

Output

{"errors_count": 3, "errors": ["queso", "tes", "preva."]}

Custom dictionary directory & mmap

import wizardspell as ws
from pathlib import Path

res = ws.spell_checking(
    "Coloar centre thetre",
    language="en",
    dict_dir=Path("~/WizardSpell_dicts"),
    use_mmap=True,
)
print(res)

Output

{"errors_count": 2, "errors": ["coloar", "thetre"]}

Operational notes

Cache location (when dict_dir=None): a per-user data directory is used. You can override it via the first existing of: WIZARDSPELL_DATA_DIR / WIZARDSPELL_DICT_DIR / WIZARDSPELL_HOME (environment variables).
Auto-download: when a dictionary is missing and dict_dir is not set, WizardSpell downloads the compressed *.marisa.zst once and reuses it subsequently.
File formats:
- *.marisa.zst files are decompressed on the fly (into memory) or to an adjacent *.marisa file when use_mmap=True.
- If you already have an uncompressed *.marisa file in dict_dir, it is used directly.
Performance:
- use_mmap=True → minimal RAM, fastest startup; excellent for large dictionaries or constrained environments.
- use_mmap=False → maximal throughput once loaded; best when RAM is plentiful.
Chinese requires jieba; all other languages work out-of-the-box.
Output tokens in errors are normalized/case-folded; they may differ in casing from the original text.

Available dictionaries

Code	Language	Code	Language
`af`	Afrikaans	`an`	Aragonese
`ar`	Arabic	`as`	Assamese
`be`	Belarusian	`bg`	Bulgarian
`bn`	Bengali	`bo`	Tibetan
`br`	Breton	`bs`	Bosnian
`ca`	Catalan	`cs`	Czech
`da`	Danish	`de`	German
`el`	Greek	`en`	English
`eo`	Esperanto	`es`	Spanish
`fa`	Persian	`fr`	French
`gd`	Scottish Gaelic	`gn`	Guarani
`gu`	Gujarati (`gu_IN`)	`he`	Hebrew
`hi`	Hindi	`hr`	Croatian
`id`	Indonesian	`is`	Icelandic
`it`	Italian	`ja`	Japanese
`kmr`	Kurmanji Kurdish	`kn`	Kannada
`ku`	Central Kurdish	`lo`	Lao
`lt`	Lithuanian	`lv`	Latvian
`mr`	Marathi	`nb`	Norwegian Bokmål
`ne`	Nepali	`nl`	Dutch
`nn`	Norwegian Nynorsk	`oc`	Occitan
`or`	Odia	`pa`	Punjabi
`pl`	Polish	`pt`	Portuguese (EU)
`ro`	Romanian	`ru`	Russian
`sa`	Sanskrit	`si`	Sinhala
`sk`	Slovak	`sl`	Slovenian
`sq`	Albanian	`sr`	Serbian
`sv`	Swedish	`sw`	Swahili
`ta`	Tamil	`te`	Telugu
`th`	Thai	`tr`	Turkish
`uk`	Ukrainian	`vi`	Vietnamese

License

AGPL-3.0-or-later

Resources

Contact & Author

Author: Mattia Rubino
Email: textwizard.dev@gmail.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Aug 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wizardspell-1.0.0.tar.gz (59.5 kB view details)

Uploaded Aug 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wizardspell-1.0.0-py3-none-any.whl (47.9 kB view details)

Uploaded Aug 28, 2025 Python 3

File details

Details for the file wizardspell-1.0.0.tar.gz.

File metadata

Download URL: wizardspell-1.0.0.tar.gz
Upload date: Aug 28, 2025
Size: 59.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardspell-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`b15725b86b9c1b0ad7c45f5eb42bbaafcbace20b310e68a38444fbec3070459a`
MD5	`41ed262ed186f956232677e38234ee58`
BLAKE2b-256	`e52bd54986f265aa761af69afec5d54a991d747e33296ed06369db69ecf9db50`

See more details on using hashes here.

File details

Details for the file wizardspell-1.0.0-py3-none-any.whl.

File metadata

Download URL: wizardspell-1.0.0-py3-none-any.whl
Upload date: Aug 28, 2025
Size: 47.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardspell-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4878cd6732287de839da10f1fa1a87c7057d18cccba20a5c9b0cc177ab285a20`
MD5	`c0300ebd755231e593778b6d2a370cb8`
BLAKE2b-256	`e4b0521e90ef2eb951047a35e64f6108a4347488dd35d786ea6d619e1840376d`

See more details on using hashes here.

wizardspell 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WizardSpell

Contents

Installation

Quick start

Spell Checking

Behavior

Parameters

Return value

Examples

Custom dictionary directory & mmap

Operational notes

Available dictionaries

License

Resources

Contact & Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes