Fast language detection for Python powered by Rust
Project description
papagan
Fast language detection for Python, powered by Rust (via PyO3 + maturin).
10 languages bundled, weighted per-word output, fully typed (PEP 561).
Install
pip install papagan
Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.
Quick start
from papagan import Detector
detector = Detector()
# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996
# Full distribution
for lang, score in output.distribution():
print(f" {lang}: {score:.3f}")
Per-word detail
Useful for mixed-language text or debugging:
detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")
for word in detailed.words:
top_lang, top_score = max(word.scores, key=lambda x: x[1])
print(f" {word.token:<10} [{word.source}] {top_lang} ({top_score:.2f})")
# the [dict] en (0.85)
# cat [ngram] en (0.99)
# ...
# katze [ngram] de (1.00)
# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]
Restrict to specific languages
Faster and more confident when you know the input's language set in advance:
detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()
Configuration
detector = Detector(
only=["en", "de", "fr"], # restrict to a subset
unknown_threshold=0.25, # below this => ("?", ...) aka Lang.Unknown
parallel_threshold=128, # parallelize at 128+ words
)
Supported languages
| Code | Language | Code | Language |
|---|---|---|---|
de |
German | it |
Italian |
en |
English | nl |
Dutch |
es |
Spanish | pl |
Polish |
fr |
French | pt |
Portuguese |
ru |
Russian | tr |
Turkish |
All 10 languages are bundled — no feature flags to set.
Type hints
The package ships .pyi stubs and a py.typed marker (PEP 561):
from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource
def classify(text: str) -> LangCode:
lang, _score = Detector().detect(text).top()
return lang # typed as Literal["de", "en", ..., "?"]
Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.
Accuracy
~99.4% on a 5000-sentence Tatoeba evaluation across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish — perfect) and slightly weaker on the close Iberian pair (Spanish/Portuguese — about 1.5% cross-confusion at dict-5k).
License
Dual-licensed under MIT or Apache-2.0, at your option.
Related
- Rust crate — the core library
- Node.js package — Node.js bindings
- GitHub — source, issues, development
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file papagan-0.1.5.tar.gz.
File metadata
- Download URL: papagan-0.1.5.tar.gz
- Upload date:
- Size: 33.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4acdf8d69c88439eb06b0de03547fe420efc55137191584b19276f9a2cf9803
|
|
| MD5 |
304ffab6a212736a43aba89a988a4c9c
|
|
| BLAKE2b-256 |
8e6a3d2c2e4500df94d22a552b44993712690e31f54abe4c647192c09764a47e
|
File details
Details for the file papagan-0.1.5-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 813.7 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e82a0b465c55cac485ec411f672fc413c9d8cd3a88bf86ac9d3c361ab41b12fa
|
|
| MD5 |
2ae0836f81577b610ba2efce503b8346
|
|
| BLAKE2b-256 |
3c37fbded37404567f98ad2dff5cc34f4a3f1bdf48a16488f2c9b60aec7909ba
|
File details
Details for the file papagan-0.1.5-cp310-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5516a4f7d5328242645cebfcdaa4c97bb328d357805390c10f6536834442403
|
|
| MD5 |
a3cd907f131b3771777495e566ca9097
|
|
| BLAKE2b-256 |
60d39dbe11dd3fbb9297cfb9bc0ad4d3787d727e411577f88721ab28d35d0ba1
|
File details
Details for the file papagan-0.1.5-cp310-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d0a081d9da634e17fc628d79b1e01fc0755fcb10da9d0825154284ebc2377f2
|
|
| MD5 |
bc98ab39c385256af81c3de769004e47
|
|
| BLAKE2b-256 |
92a28bb718f6a60285d7fa8ac111a3896844b3edb0cc1a7e28a8fae08a268740
|
File details
Details for the file papagan-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c016796b2798f23d9ac5fd254fbb9b0beccc82289133d45c8c2fcab158abaa3
|
|
| MD5 |
d444b6c3593a4c4600da265f018ffb7d
|
|
| BLAKE2b-256 |
2ecd562ef6b3332d934998912adf2b5b6ee336c1b36258841b1078699b58e48b
|
File details
Details for the file papagan-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cbdf0550655b8600e68f8fbaa75b57eb98a76ba95254b4b71ea573c47663ea5
|
|
| MD5 |
4544bf7fc0b5571a29498dbaad535199
|
|
| BLAKE2b-256 |
64a9a32abe7f56147e306226ba15659c8bcfa8dfd2bd66ad9fa6d3da9eca6f9e
|
File details
Details for the file papagan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.0 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ed62ebe90f3bd0f76f3274facf8dc7bb27d994fd1d9fe3a3b05769e5ae77be7
|
|
| MD5 |
f97dfca2eb44f54814fe7d0496df9e06
|
|
| BLAKE2b-256 |
924fe246be7a0af3bfd219353dd1ed0bb09abbdefcbd3c71e6bc74d161fb2f2f
|
File details
Details for the file papagan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: papagan-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23ddf0c028f9ade0ba7fb285eec1401c18c4a67e473d18e9483895a93740196f
|
|
| MD5 |
63403175e1e9a62175229d03f3cc050c
|
|
| BLAKE2b-256 |
c5509a38ff07793121be17a063bccbc9d1c76df033e13763a9d40738a24e97e1
|