Skip to main content

Fast language detection for Python powered by Rust

Project description

papagan

PyPI Python versions

Fast language detection for Python, powered by Rust (via PyO3 + maturin).

10 languages bundled, weighted per-word output, fully typed (PEP 561).

Install

pip install papagan

Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.

Quick start

from papagan import Detector

detector = Detector()

# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996

# Full distribution
for lang, score in output.distribution():
    print(f"  {lang}: {score:.3f}")

Per-word detail

Useful for mixed-language text or debugging:

detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")

for word in detailed.words:
    top_lang, top_score = max(word.scores, key=lambda x: x[1])
    print(f"  {word.token:<10} [{word.source}]  {top_lang} ({top_score:.2f})")
# the        [dict]   en (0.85)
# cat        [ngram]  en (0.99)
# ...
# katze      [ngram]  de (1.00)

# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]

Restrict to specific languages

Faster and more confident when you know the input's language set in advance:

detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()

Configuration

detector = Detector(
    only=["en", "de", "fr"],       # restrict to a subset
    unknown_threshold=0.25,         # below this => ("?", ...) aka Lang.Unknown
    parallel_threshold=128,         # parallelize at 128+ words
)

Supported languages

Code Language Code Language
de German it Italian
en English nl Dutch
es Spanish pl Polish
fr French pt Portuguese
ru Russian tr Turkish

All 10 languages are bundled — no feature flags to set.

Type hints

The package ships .pyi stubs and a py.typed marker (PEP 561):

from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource

def classify(text: str) -> LangCode:
    lang, _score = Detector().detect(text).top()
    return lang  # typed as Literal["de", "en", ..., "?"]

Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.

Accuracy

~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).

License

Dual-licensed under MIT or Apache-2.0, at your option.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papagan-0.1.4.tar.gz (33.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papagan-0.1.4-cp310-abi3-win_amd64.whl (813.5 kB view details)

Uploaded CPython 3.10+Windows x86-64

papagan-0.1.4-cp310-abi3-musllinux_1_2_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

papagan-0.1.4-cp310-abi3-musllinux_1_2_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

papagan-0.1.4-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

papagan-0.1.4-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

papagan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

papagan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file papagan-0.1.4.tar.gz.

File metadata

  • Download URL: papagan-0.1.4.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.4.tar.gz
Algorithm Hash digest
SHA256 a26ed3bdc72bd145aea474b75ba93f6adf4385fc056ebf92040e1b7ba16035c7
MD5 870b46c5aaa5ee50532f147bc93d923f
BLAKE2b-256 de6b527fd2bce4800c9d1c9fa3f752a1448acfc6a0002d41e916e6a6cd11487f

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: papagan-0.1.4-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 813.5 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for papagan-0.1.4-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 79a465392a2d74285140320051c1a7af6a15975957725c3b3487b22ac8ce7d81
MD5 fef542803cd005e303b641f1e59b3a16
BLAKE2b-256 c1095b93f18f1b1379b17ba4e47e6dc738e3204c817bb2eb73255639c6428988

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 23236d3a01595bb1e2513c7d54e00d6dfd646a0e2afd2df5d5e1edc461087c8b
MD5 39c67a487dddcc97054bbf894a378744
BLAKE2b-256 c59b0552c88826ea4e94ae831b07672ba7346ada2d0b1d8b374047b7713bd0a3

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 3ee05f7300e921f0ad3777f913ca6f21909de9c4ecfed6b4cbf0fa98aabea2e2
MD5 df4b71eb59a3745a7945589c12050af9
BLAKE2b-256 db94c45fb9bafd91cec45f5e11519fa991ee959302a8877dfab2c731ab463031

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4d6e868af75dc141820f0a23387b990a13632bd8198ec479343274f8fc14ae6e
MD5 e33b27dfb7151b72681fe59c25ddd0d5
BLAKE2b-256 1eaf2bff8fe202b7e7dff8fb69ed1261a37f6dfb3331f951376d6440f2608c8d

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 acba7f7c9b97e2c37c44f34970b25a7e34bf96c297bf179fc0621eeb3a3ab027
MD5 501b718a63d061d514ced1589a6fead1
BLAKE2b-256 1534c22408ef6b1cb92edd44d08bdbddc04d59d94c712ddb79a72d81038d3226

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e0f93b713245580d67141d145b30621c66617619f935f8da208e523fa01d073b
MD5 71803c8d4afd4550471b846d864ae50a
BLAKE2b-256 cec6a04cab4db2dc24fdd2504d94d42f39816e2b3d26ebea09764e433a84384c

See more details on using hashes here.

File details

Details for the file papagan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for papagan-0.1.4-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4af0991a3aeab0b7403913b58a31da0c633f06cac58176158b6327993f8f544f
MD5 63f6fed3b51a494089c2fe480a3cca49
BLAKE2b-256 d91dbc3adaf2863ce655f09d9e15e0a95cb6182b5958ec3633e4680ee41d5c32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page