Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.1.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.1-cp310-abi3-win_amd64.whl (813.5 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.1-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.1-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.1-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.1.tar.gz.

File metadata

  • Download URL: papagan-0.1.1.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9a76721ea159d12221ce6150b323c8368b68b8851fd41c3dda738ef11b2cd49a
MD5 ce7dbf28c2b8677c9aea274aa91dc739
BLAKE2b-256 335b4201449f3cd4b676243665a6b41a6f1277b947c4a5c4b9b00d1e8d439c18

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.5 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 57627578e623fa703a991af935133032b4d286e8291d0ff7b58123ce65cfa26c
MD5 5f717ce68c3f1be0973ab6f080e2eaab
BLAKE2b-256 f05a45cf370b051df3f4a0ae0428fd92366600512c8a6f43084f0ded65118b33

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 925b909ecaf96a0b2bc2a66c5be5d29e7c2906f2917844c4ad8877d528eee617
MD5 2ef5bab35a649943f95624fa08fd7db4
BLAKE2b-256 86b11b1ed03adb0a1934f0a22da112ccb5505a33cd9db5866650a611242919e5

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 5dc2217aacec1d5421da8f83e48653f47197447bb4dd5e9831c1ef12da28f155
MD5 a8ca19f48c4b9cd7a51f2659e04047c0
BLAKE2b-256 8379b753fd59dc799829016059a8ba9763d955a90d316f7ec7fdbf607b4f7314

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d6dec5d30a3537d2b0a84225f9b3a72399bc5cda83e23c0dfef7d709a4c1d1a1
MD5 c1e8762aaa462cb4ff0acd8c21b398b5
BLAKE2b-256 1da582c28b5f61787ddbc3f7c36a91ecbdb599c693b6a0f75d99a93ecd89613f

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f8b261e7df5611c38e9b09054268a27236d796ede5c6a8aa9fe1e568e37d0c4a
MD5 530f195f522c7a5a45c48ef9d08f653a
BLAKE2b-256 977bee9a88dde8fbd5f315356eb4532cfaee1eefb3c4b2863e070ae8274b7b5d

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 60dba8fee276200ec7fd5eb56dd260b262043368855d825b25eb0c151d85a8b6
MD5 99f452405884a14a033751957ae85f67
BLAKE2b-256 d91f6c71a2dd11f6c4ecedbcd2416b4398ff50cd8d56e1da83d4e20dc38e8fc1

See more details on using hashes here.

File details

Details for the file papagan-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.1-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dd947ae4995b68a4c967e19b96547e6c58ee9e04027364634d3594c706aa037b
MD5 8f633cde2b3236dd2b55e5e64b3a99be
BLAKE2b-256 5db853c5dbab2653290caf7a7eb65634629da1fef0f25fba0e27873b5b523dbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page