Skip to main content

Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.

Project description

WizardLangID Banner


WizardLangID

PyPI - Version PyPI - Downloads/month License

WizardLangID is a Python library for Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.


Contents


Installation

Requires Python 3.9+.

pip install wizardlangid

Quick start

import wizardlangid as wl

text = "hello world"
lang = wl.lang_detect(text, return_top1=True)
print(lang) 

Language detection

Parameters

  • text: Input string (Unicode).
  • top_k: Number of candidates to return (default 3).
  • profiles_dir: Override the bundled profiles directory.
  • use_mmap: If True, memory-map the profile tries (lower RAM; first access may be slightly slower).
  • return_top1: If True, return only the best language code; otherwise a list of (lang, prob).

Examples

Top-1 (single code)

import wizardlangid as wl

text = "Ciao, come stai oggi?"
lang = wl.lang_detect(text, return_top1=True)
print(lang) 

Output

it

Top-k distribution

import wizardlangid as wl

text = "The quick brown fox jumps over the lazy dog."
langs = wl.lang_detect(text, top_k=5, return_top1=False)
print(langs) 

Output

[('en', 0.9999376335362183), ('mg', 4.719212057614953e-05), ('fy', 1.4727973350205069e-05), ('rm', 2.8718519851832537e-07), ('la', 1.5918465665694727e-07)]

Batch example

import wizardlangid as wl

tests = [
    "これは日本語のテスト文です。",
    "Alex parle un peu français, aber nicht so viel.",
    "¿Dónde está la estación de tren?",
]
for s in tests:
    print("TOP1:", wl.lang_detect(s, return_top1=True))

Output

TOP1: ja
TOP1: fr
TOP1: es

Custom profiles & mmap

from pathlib import Path
import wizardlangid as wl

langs = wl.lang_detect(
    "Buongiorno a tutti!",
    profiles_dir=Path("/opt/wizardlangid/profiles"),  # custom profiles
    use_mmap=True,                                   # lower RAM
    top_k=3,
)
print(langs)

License

AGPL-3.0-or-later.

RESOURCES


Contact & Author

Author: Mattia Rubino
Email: textwizard.dev@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wizardlangid-1.0.0.tar.gz (21.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wizardlangid-1.0.0-py3-none-any.whl (22.1 MB view details)

Uploaded Python 3

File details

Details for the file wizardlangid-1.0.0.tar.gz.

File metadata

  • Download URL: wizardlangid-1.0.0.tar.gz
  • Upload date:
  • Size: 21.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardlangid-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0e7dd49142c858a809ea26889f9712773c944f1c9d8804eedc9fbe48ddf57cc2
MD5 41ea7048f23b39d173cb69b872004353
BLAKE2b-256 70c588761d3a36bc55f0406f30cb620dfffd0cc240b2108c693ddf32d4028cfc

See more details on using hashes here.

File details

Details for the file wizardlangid-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: wizardlangid-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for wizardlangid-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87c6a2871ed74712859408136f744dc3f6fc30109ca5dd55aaa5dff6a272c520
MD5 85e9c333dc5ac571c205b7ea208e9cde
BLAKE2b-256 6e89e0ab0fbce2b6230e12d25d620dc2e05cebc84dc8d7e57edd475e0b9458b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page