Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.5.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.5-cp310-abi3-win_amd64.whl (813.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.5-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.5-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.5.tar.gz.

File metadata

  • Download URL: papagan-0.1.5.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f4acdf8d69c88439eb06b0de03547fe420efc55137191584b19276f9a2cf9803
MD5 304ffab6a212736a43aba89a988a4c9c
BLAKE2b-256 8e6a3d2c2e4500df94d22a552b44993712690e31f54abe4c647192c09764a47e

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.5-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.5-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e82a0b465c55cac485ec411f672fc413c9d8cd3a88bf86ac9d3c361ab41b12fa
MD5 2ae0836f81577b610ba2efce503b8346
BLAKE2b-256 3c37fbded37404567f98ad2dff5cc34f4a3f1bdf48a16488f2c9b60aec7909ba

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a5516a4f7d5328242645cebfcdaa4c97bb328d357805390c10f6536834442403
MD5 a3cd907f131b3771777495e566ca9097
BLAKE2b-256 60d39dbe11dd3fbb9297cfb9bc0ad4d3787d727e411577f88721ab28d35d0ba1

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 1d0a081d9da634e17fc628d79b1e01fc0755fcb10da9d0825154284ebc2377f2
MD5 bc98ab39c385256af81c3de769004e47
BLAKE2b-256 92a28bb718f6a60285d7fa8ac111a3896844b3edb0cc1a7e28a8fae08a268740

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9c016796b2798f23d9ac5fd254fbb9b0beccc82289133d45c8c2fcab158abaa3
MD5 d444b6c3593a4c4600da265f018ffb7d
BLAKE2b-256 2ecd562ef6b3332d934998912adf2b5b6ee336c1b36258841b1078699b58e48b

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1cbdf0550655b8600e68f8fbaa75b57eb98a76ba95254b4b71ea573c47663ea5
MD5 4544bf7fc0b5571a29498dbaad535199
BLAKE2b-256 64a9a32abe7f56147e306226ba15659c8bcfa8dfd2bd66ad9fa6d3da9eca6f9e

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1ed62ebe90f3bd0f76f3274facf8dc7bb27d994fd1d9fe3a3b05769e5ae77be7
MD5 f97dfca2eb44f54814fe7d0496df9e06
BLAKE2b-256 924fe246be7a0af3bfd219353dd1ed0bb09abbdefcbd3c71e6bc74d161fb2f2f

See more details on using hashes here.

File details

Details for the file papagan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 23ddf0c028f9ade0ba7fb285eec1401c18c4a67e473d18e9483895a93740196f
MD5 63403175e1e9a62175229d03f3cc050c
BLAKE2b-256 c5509a38ff07793121be17a063bccbc9d1c76df033e13763a9d40738a24e97e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page