Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.2.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.2-cp310-abi3-win_amd64.whl (813.5 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.2-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.2-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.2-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.2.tar.gz.

File metadata

  • Download URL: papagan-0.1.2.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d97570d5879964600b87c1efb4e30e881992cf590db5f84397ac99f708b80a2f
MD5 79b456f6a7b75b31e033ab723d543d05
BLAKE2b-256 6514711872492ba26c8a432b8e6824e91950e3294967c538e60a7ac3c4e5e025

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.2-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.5 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ce19f47b0750f45ea13f5e0a59dae1557cde45458cb3abaf49b3a84d2f37b0d3
MD5 0cf30472d744e60f29620789cd641dc6
BLAKE2b-256 4887e372ce77ae81acc75ea34b99164a33855b2f55aa7e151e624a70a884dd31

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8fddc18e539e8c11eb3467ecc5de1f6812febc682aea4b835154559b74192b68
MD5 1c8e346fe9c0909cfe71a7a142705033
BLAKE2b-256 b4a7e46ec96baea98a3763a1c49ecd2427e2ddc252bd9782e9cc756402559a86

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 c3e668cca0e7a3fdc66e1001b95ab26ffe9bddc731a9b1b5b58d3307c96ce84e
MD5 379a5aa806740ca6ba91cb8f627d3669
BLAKE2b-256 327b2b2ae0178a7cd51e136f3d308c215f8afe6f2e62415e5c8973f7cee0f695

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6c048cc2a98f3276b73cfbbed0b024c052d2ec76bbca55b99c67fbbc383b4ec8
MD5 79444f952ebac3fe76bac6d378da472c
BLAKE2b-256 d8e19070508fc27cf357e83474f5e6b6c4a940c7e1b8e0e582f88bc3f1c6822f

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a25f8038ea59d1bc711f61249a31eade294c114e3af998bb8bd178495e30bb9a
MD5 d6010c17ccf3ae6099e20717bb9fdb7e
BLAKE2b-256 ce934da2ec9c082b2348f7234bb5f0bbe48243bbfdeb9f7430ab5e789d2d462c

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1d8ed5d6bfb3d1643c26b3aa33cfc23840738351dbd9f1564f8860f913e8ae2b
MD5 f439259d07f2f48f6aedcfdfb2af1823
BLAKE2b-256 b2502ab5db4d51d26a6367b666f54125c41aa7f419ec5ff38d0e7df83b474ede

See more details on using hashes here.

File details

Details for the file papagan-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.2-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f3c2e736aa0821ce7a61ab1abf42b893d8feebeda598619bf9f978974e0e823e
MD5 2d27be9daf4cecd98a4bc5b6a0df8e19
BLAKE2b-256 12c12a16f8782f9ff72dbf31309d1f56f62256f2f0e1f02b44b336b7a36ebfbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page