Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.0.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.0-cp310-abi3-win_amd64.whl (813.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.0-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.0-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.0-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.0-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.0.tar.gz.

File metadata

  • Download URL: papagan-0.1.0.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e3ac398dca65994455141598ef2fb633bdaefa33b1a59780e9951cf362dd6dad
MD5 080be7b523fcfabb4002f2f57e79321a
BLAKE2b-256 96f0a4df789e9e5a500cf8f08cf450171a7606de3cbc705c384085c6926eda54

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f51cf1379c355ff0f487ae82bce904c9c8016ab040e33800e8dc259331c4a579
MD5 770a7294163c1e623bca14d4ed95437b
BLAKE2b-256 70ae04832404d7a3aeab951d2b9cf38185217a5addfdd86d2bc00c360c0dda63

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 747abf4dc9495e0979fa3c073545495cf99d4991d58b246791d900f6f862fed6
MD5 f81f7b066b286c8363fbe845952d616f
BLAKE2b-256 dc2b8a2b6200249a48c0c9c7d6b5c7dc3334ac555f319bc3f9fc7d8f2416866b

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 b84632d427cfbc9e9e7b1b7a740b96cd1004bb26c31317397c70ef198e67c5b6
MD5 00a608e6fee644562a57bfa2bbbcb85e
BLAKE2b-256 6e4a8e4e17dd7d7c8a952a73c8ec9442336b49184c4c5fe520711c3573127c84

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2f0f59a9cde48430b525eec1fba751465d440c52ee48e6ab9200689a879459da
MD5 693c093ba582281c6121f67846a1e9ee
BLAKE2b-256 576eed3ebacd43c379199ec2dc90b567afabaeb8d0faa5ee58f793fa3401cc67

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3f689b64dff7e6afdb9de4f6311e5c02123e2d65a4aec218c716069a599d2b52
MD5 37e3765e4c1c0c7253490878a64cc01a
BLAKE2b-256 f8ce79c549de08d80a8dbf4df3c525eb792e17b2c50591fba9a097b6a0f61cb2

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 236dd9cc8a90314545af9ab5287e64ba9334b3d4dc756a85b9467e5e51ea1a5d
MD5 b56dc90d46c298a1b9efc87e0a48e1d1
BLAKE2b-256 65185cb5bb22e13ed7f34791a7d939c3bc1dae80f70da272a17e616fa422ef24

See more details on using hashes here.

File details

Details for the file papagan-0.1.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7839dcaec57b498e1d0d9b4bd61daaec0d03fc92cbf7ad6f4309dd7946c58c9e
MD5 d2d745c818c9a95be3c9fc9dc9a2e996
BLAKE2b-256 c19f6dfa23f1b9a1ea6813e007f05162c3fc2f8ce26070a739552bfc7ab2f800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page