Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.3.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.3-cp310-abi3-win_amd64.whl (813.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.3-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.3-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.3.tar.gz.

File metadata

  • Download URL: papagan-0.1.3.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1cd8d1141adc76c75c3c73f8247ab545a3d94a9171d145639f3cf190025efed9
MD5 90bf47937818fa9b1576136acb3bc621
BLAKE2b-256 79c16247ae11ae8f8992ddaf6154f02b1b9d524f83efd926c21aefe31b63fdb4

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 fded1ce0dae50c9af7091bd0159e3decc7d6758cee2995e4b6beea1ef19d0cde
MD5 a373c08d2c900cb7f783c861773367cf
BLAKE2b-256 b9a3e2eeecfc6d04a69c89368630409429f9316b372810c66c76304e8b41d66d

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 05c2c0bcf888849ce6c2c4295844ab50e164ef8a9ab6049c0153a8a9259d7de9
MD5 e1fb2dfbdeeb2c4152f4f61d26591f9b
BLAKE2b-256 8d713708eca76352034101b5f03f66ea3efe8f64b74e5f062e4956ac82ee345a

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 72c9f1b567ccaad27f9b4a1cbf808734ead5a8b74b4531cede60e30060af4387
MD5 5affb1fc8a1cbbc54cb1e9c7fa42ee55
BLAKE2b-256 d64764bcb334876326f11d2a8f0cbc4b6448b79d6a195120daacfd7a5fe888db

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4c7355cd7a1c0caf1b4cf1bed899c45cd6e7f9212dafd5d7e802e999378269c3
MD5 119381f3bf82a04a3e6805d3aeec8a56
BLAKE2b-256 2ea780fe54fca4dd94c39a9986adb8e908b595fc6b76ca43b122def374faa32b

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 96f3d9a11d6c29deff4d53834c49dea57e6e72c95b3557ab9b776881d3b59de6
MD5 8c9eef447af740eeef42c602d45cd310
BLAKE2b-256 c794dce36a9710f74bbc684e730daa4a9e2a852ca3e0eddcabcd879dd479af0a

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7da77706783b0c83e656f9efad92e713ccddc041605c564444d39775c4cf5ceb
MD5 ed54238a73836c886c0fd1d3fe5876cd
BLAKE2b-256 08e0cb601979ee36bcfe5a9037cf02f69fe9fb764960654bd99c888255465641

See more details on using hashes here.

File details

Details for the file papagan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.3-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dcc7e0b4ec0b64aeb6172564a5eaab3dd9d895218220a73176a1290a9f9187c9
MD5 b256d0ab89d7831600ac208b0f24f7d6
BLAKE2b-256 2c2dda257de97bdce8b274a631eac261a88248885d9442db1477aa2a9cafd3db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page