Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.
Project description
WizardLangID
WizardLangID is a Python library for Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.
Contents
Installation
Requires Python 3.9+.
pip install wizardlangid
Quick start
import wizardlangid as wl
text = "hello world"
lang = wl.lang_detect(text, return_top1=True)
print(lang)
Language detection
Parameters
text: Input string (Unicode).top_k: Number of candidates to return (default3).profiles_dir: Override the bundled profiles directory.use_mmap: IfTrue, memory-map the profile tries (lower RAM; first access may be slightly slower).return_top1: IfTrue, return only the best language code; otherwise a list of(lang, prob).
Examples
Top-1 (single code)
import wizardlangid as wl
text = "Ciao, come stai oggi?"
lang = wl.lang_detect(text, return_top1=True)
print(lang)
Output
it
Top-k distribution
import wizardlangid as wl
text = "The quick brown fox jumps over the lazy dog."
langs = wl.lang_detect(text, top_k=5, return_top1=False)
print(langs)
Output
[('en', 0.9999376335362183), ('mg', 4.719212057614953e-05), ('fy', 1.4727973350205069e-05), ('rm', 2.8718519851832537e-07), ('la', 1.5918465665694727e-07)]
Batch example
import wizardlangid as wl
tests = [
"これは日本語のテスト文です。",
"Alex parle un peu français, aber nicht so viel.",
"¿Dónde está la estación de tren?",
]
for s in tests:
print("TOP1:", wl.lang_detect(s, return_top1=True))
Output
TOP1: ja
TOP1: fr
TOP1: es
Custom profiles & mmap
from pathlib import Path
import wizardlangid as wl
langs = wl.lang_detect(
"Buongiorno a tutti!",
profiles_dir=Path("/opt/wizardlangid/profiles"), # custom profiles
use_mmap=True, # lower RAM
top_k=3,
)
print(langs)
License
RESOURCES
Contact & Author
Author: Mattia Rubino
Email: textwizard.dev@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wizardlangid-1.0.0.tar.gz.
File metadata
- Download URL: wizardlangid-1.0.0.tar.gz
- Upload date:
- Size: 21.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e7dd49142c858a809ea26889f9712773c944f1c9d8804eedc9fbe48ddf57cc2
|
|
| MD5 |
41ea7048f23b39d173cb69b872004353
|
|
| BLAKE2b-256 |
70c588761d3a36bc55f0406f30cb620dfffd0cc240b2108c693ddf32d4028cfc
|
File details
Details for the file wizardlangid-1.0.0-py3-none-any.whl.
File metadata
- Download URL: wizardlangid-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87c6a2871ed74712859408136f744dc3f6fc30109ca5dd55aaa5dff6a272c520
|
|
| MD5 |
85e9c333dc5ac571c205b7ea208e9cde
|
|
| BLAKE2b-256 |
6e89e0ab0fbce2b6230e12d25d620dc2e05cebc84dc8d7e57edd475e0b9458b9
|