Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

uv add papagan
# or
pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.6.tar.gz (36.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.6-cp310-abi3-win_amd64.whl (814.3 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.6-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.6-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.6-cp310-abi3-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.6-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.6.tar.gz.

File metadata

  • Download URL: papagan-0.1.6.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.6.tar.gz
Algorithm Hash digest
SHA256 9b23e7a46f3a20d781aed08a1091514811d54034638c2fab8ea61b30ba01960a
MD5 86940437d2428bcbc5ebf9d63aa09209
BLAKE2b-256 346449010c49da7d05aa1cf0b43513ced226dea84f20a6bf7eaba2f73c38662c

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.6-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 814.3 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.6-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a52d81b7475d0b860d836cbf3a8b679db284768d962e62fd7980286f43b0aa78
MD5 42d19aa5ab8a66b9872fdd151e7ea215
BLAKE2b-256 95eb8f9b8bae6fad937e4d03b8f251e5cc210a0034a76674735e2e686d372061

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 69ea409dc9f273bd53a9da8f27bb06cacbed2c00aeec37be6492ef3d32c9ea9d
MD5 0d2e77e5d1d8f9568d1fd1dcc1d1f404
BLAKE2b-256 01abb4eced7ae75110479b605bbf05c3fe51738ff4f74286b376d8c0884a0559

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 04d0cc255e89cd84ece22eeb54129c7a73dabda02755ff660c50cdc00f7098c2
MD5 a9452b26ed68b192e9d5d1bcde77d76f
BLAKE2b-256 cb10b3fb6abbfbd9d2afdbdf1580c0c669748dd89934ee6372af52884a8161d9

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 39ee59e8750a8cd15d59d631911a90beee0908dfc848fdbe217515a2364de0a9
MD5 9f5153e658661b09d9f40cbbc230ed28
BLAKE2b-256 22b8a56d1e091d368dca3f7461a8f1857b9b4386c166688f3e8e08d4c5adac4d

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 896f97e98251295e943968214c04b7991cc94f572b91f02ca938135322913f03
MD5 4aedd9c6e7974f0d0c4354fadb804135
BLAKE2b-256 f22a5ddeeec88ca6bbc89870fb16caf4c1a3f62f3b32bcc7230fdba66b3511d2

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 abc70aca189e0286f6fb8c37a8d1d2d0ea8cfcf789f02022dc5222c5c27a257a
MD5 c9a7123487b180d6d26f4daa8725c6d6
BLAKE2b-256 be246457a411fad663ebf57edc2dd5ceb090d5d1228b655e4d2c9f2177653cf5

See more details on using hashes here.

File details

Details for the file papagan-0.1.6-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.6-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 009ba1a0f60613b7252e02dc29cea84cd71bc4c0824b673895d1c606f0c0cc7e
MD5 30dc0b78cbb2862fe9778d052f1bff79
BLAKE2b-256 40bdccc581211d5a706ac1d8176e58ee1c9ccb6aef8a13e5e6837e9ecf181712

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page