Fast language detection for Python powered by Rust
Project description
papagan
Fast language detection for Python, powered by Rust (via PyO3 + maturin).
10 languages bundled, weighted per-word output, fully typed (PEP 561).
Install
uv add papagan
# or
pip install papagan
Pre-built wheels ship for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). Python 3.10+.
Quick start
from papagan import Detector
detector = Detector()
# Document-level detection
output = detector.detect("Die Katze sitzt auf der Matte")
lang, confidence = output.top()
print(f"{lang}: {confidence:.3f}")
# de: 0.996
# Full distribution
for lang, score in output.distribution():
print(f" {lang}: {score:.3f}")
Per-word detail
Useful for mixed-language text or debugging:
detailed = detector.detect_detailed("The cat is black. Die Katze ist schwarz.")
for word in detailed.words:
top_lang, top_score = max(word.scores, key=lambda x: x[1])
print(f" {word.token:<10} [{word.source}] {top_lang} ({top_score:.2f})")
# the [dict] en (0.85)
# cat [ngram] en (0.99)
# ...
# katze [ngram] de (1.00)
# The aggregate handles mixed input gracefully:
print(detailed.aggregate.distribution())
# [('de', 0.52), ('en', 0.48)]
Batch detection
For multi-document workloads, detect_batch fans out across cores via rayon and releases the GIL while running — so concurrent Python threads can do other work and scale-out on ThreadPoolExecutor behaves as expected:
docs = ["The cat sat", "Die Katze sitzt", "Le chat est assis", "El gato está sentado"]
results = detector.detect_batch(docs) # list[Output]
detailed = detector.detect_detailed_batch(docs) # list[Detailed]
for o in results:
print(o.top())
On a 1000-paragraph batch (Leipzig news, avg 84 words each, 8-core M-series), detect_batch is ~3.5× faster than calling detect() in a Python loop — 90 ms → 26 ms. On 1870 short titles it's ~5× faster (16 ms → 3 ms) since rayon setup amortizes better over dict-hit-heavy tokens.
Batches smaller than 4 fall back through the normal per-call path so there's no small-batch regression.
Restrict to specific languages
Faster and more confident when you know the input's language set in advance:
detector = Detector(only=["en", "de"])
# or with the builder:
detector = Detector.builder().only(["en", "de"]).build()
Configuration
detector = Detector(
only=["en", "de", "fr"], # restrict to a subset
unknown_threshold=0.25, # below this => ("?", ...) aka Lang.Unknown
parallel_threshold=32, # parallelize per-word work at 32+ tokens (default)
# set parallel_threshold to a very large number to opt out of rayon entirely
)
Supported languages
| Code | Language | Code | Language |
|---|---|---|---|
de |
German | it |
Italian |
en |
English | nl |
Dutch |
es |
Spanish | pl |
Polish |
fr |
French | pt |
Portuguese |
ru |
Russian | tr |
Turkish |
All 10 languages are bundled — no feature flags to set.
Type hints
The package ships .pyi stubs and a py.typed marker (PEP 561):
from papagan import Detector, Lang, Output, WordScore, LangCode, MatchSource
def classify(text: str) -> LangCode:
lang, _score = Detector().detect(text).top()
return lang # typed as Literal["de", "en", ..., "?"]
Your type checker (mypy, pyright) will see full signatures for all classes, including the LangCode and MatchSource Literal types.
Benchmarks
Measured on Darwin arm64, 2026-04-22. Open fixtures: Tatoeba sentences (CC-BY 2.0 FR) and Leipzig news paragraphs (CC-BY 4.0). ns/tok is the per-token rate — the cleanest cross-library comparison since it normalizes out workload size. Full cross-binding matrix and reproduction commands live in the repository README.
| Library | Tokens | Bytes | Loop (ms) | Loop (ns/tok) | Batch (ms) | Batch (ns/tok) |
|---|---|---|---|---|---|---|
| papagan | 35k | 222 KB | 32.94 | 949 | 9.67 | 279 |
| papagan | 87k | 620 KB | 79.94 | 923 | 23.74 | 274 |
| py3langid | 35k | 222 KB | 223.59 | 6 442 | — | — |
| py3langid | 87k | 620 KB | 120.81 | 1 395 | — | — |
| langdetect | 35k | 222 KB | 6 959.25 | 200 526 | — | — |
| langdetect | 87k | 620 KB | 1 348.72 | 15 570 | — | — |
| lingua (all langs) | 35k | 222 KB | 3 700.01 | 106 613 | — | — |
| lingua (all langs) | 87k | 620 KB | 2 675.98 | 30 893 | — | — |
papagan chews through ~950 ns/token on loop and ~275 ns/token with detect_batch — flat across workload size, ~7× ahead of py3langid on short sentences, ~110× ahead of lingua, ~210× ahead of langdetect. detect_batch releases the GIL so ThreadPoolExecutor scales as expected.
Accuracy
99.42 % on Tatoeba (5,000 sentences) and 99.86 % on FLORES-200 devtest (10,120 sentences) across the 10 supported languages. Per-language precision/recall is best on isolated scripts (Russian, Turkish, Polish — ~perfect) and slightly weaker on the close Romance cluster (Spanish/Portuguese/Italian); full per-language table in the repository README.
License
Dual-licensed under MIT or Apache-2.0, at your option.
Related
- Rust crate — the core library
- Node.js package — Node.js bindings
- GitHub — source, issues, development
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file papagan-0.1.7.tar.gz.
File metadata
- Download URL: papagan-0.1.7.tar.gz
- Upload date:
- Size: 684.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f60921da5e3f0957b3c7a8e00e1c01d541a8817ac8137fad35b516c6f61e1e5d
|
|
| MD5 |
b0d9bb2ca76a80809fe2650771cc2a98
|
|
| BLAKE2b-256 |
47d83d4887f47e90f74710f44fc2a4d64619605775a90174f846c328d38c332d
|
File details
Details for the file papagan-0.1.7-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 832.0 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22ff7aaee2222889afcc267a424b92813afa61995dcb52d55cd5c2148bd9afaf
|
|
| MD5 |
28ea537fa4c5823300d75d4e23130659
|
|
| BLAKE2b-256 |
2e3f2bf51dc5e12de764c7a6404197a82b840d17ebff96715a44e8e3f49496a2
|
File details
Details for the file papagan-0.1.7-cp310-abi3-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79dd6cd62c450005984278a6729191a510e8aba9a9c49d03773f86fee97e0444
|
|
| MD5 |
13bd0fa96f2f48d02f113455bf9d9149
|
|
| BLAKE2b-256 |
0efbc22a02642b5355c2825e2d86306eb70b387e4a28f32917325467db6002ee
|
File details
Details for the file papagan-0.1.7-cp310-abi3-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.10+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4227c6ae0a29a904096309a796b6ba5e87121074d4eeb19ff9d47c011e059c74
|
|
| MD5 |
2ac8b9a2304b84de58c9acecf6e1bee0
|
|
| BLAKE2b-256 |
ac3867c93b006afe7b2c772c239c75c36aefc08afb5c19951288f37fa9c94e96
|
File details
Details for the file papagan-0.1.7-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c54975f6298018a313074ab1ce0e99c36875f44a92107d1845049cb16bfd1792
|
|
| MD5 |
adabef140465e20822e3b086b7d05835
|
|
| BLAKE2b-256 |
1b7e2fafbfb781d2cca30fd0e94262a474282d6375d699acffc7f89233cc4a20
|
File details
Details for the file papagan-0.1.7-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1921838cc9ad44900b933e03ce1d967bdbc0fc5db56bbac064daec7228254112
|
|
| MD5 |
4b27aa2408e410353e96fdadb54b8b2f
|
|
| BLAKE2b-256 |
cae08c12e4f9b515ce9fda21cd7439ee869072d5cb9051d69d98402744b4078d
|
File details
Details for the file papagan-0.1.7-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8f0c010d597b6eb7e34133c18d2b039830f4c63a383b24f01e39d27420a7e3c
|
|
| MD5 |
48556ffa3e1315c9346705f9858e8694
|
|
| BLAKE2b-256 |
891eba0786a7d9943137d868e0ba42a875681569739e39a9fa27bd91201cf44a
|
File details
Details for the file papagan-0.1.7-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: papagan-0.1.7-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35757ae1da16319a49adcead1311b4dd77724e7430b529f37922135ccf10550b
|
|
| MD5 |
3ef81f39cfe45346d4f10c6653c840b1
|
|
| BLAKE2b-256 |
c8f070d5d552f51fb84d8d56813d9921be57c74309269995efb726b9d61a775b
|