Lint .po translation files for contamination, wrong languages, shifts, and garbled text
Project description
python-po-lint
Lint .po translation files for contamination, wrong languages, missing translations, shifts, and garbled text.
Uses fastText language identification with carrier phrase confirmation and confused language score merging for high accuracy with zero false positives.
Features
- Wrong language detection — fastText-based with top-5 scoring, confused language merging, and carrier phrase confirmation
- Wrong script detection — catches Cyrillic in a Dutch file, Arabic in French, Latin in Chinese, etc.
- Distinctive character detection — catches Russian-specific chars in Ukrainian and vice versa
- Fuzzy entry detection — flags entries with the fuzzy flag that need review
- Obsolete entry detection — flags obsolete entries that should be removed
- Untranslated entry detection — flags missing translations, auto-detects source language
- Shifted entry detection — finds translations that got shifted to the wrong msgid
- Garbled text detection — catches corrupted/broken unicode
- Ignore rules —
.po-lint-ignorefile with language scoping and msgctxt support - Configurable checks — disable individual checks via
pyproject.tomlor CLI
Installation
pip install python-po-lint
Or with uv:
uv add python-po-lint
The fastText language model (~126MB) is downloaded automatically on first run to ~/.cache/po-lint/.
Usage
# Lint a locale directory
po-lint locale/
# Lint with config from pyproject.toml
po-lint
# Only check specific languages
po-lint locale/ --languages fr de nl
# Use compact model (917KB, less accurate)
po-lint locale/ --compact-model
# JSON output
po-lint locale/ --format json
# Custom confidence threshold
po-lint locale/ --confidence 0.6
# Custom minimum detection length
po-lint locale/ --min-detection-length 25
# Specify source language (default: en)
po-lint locale/ --source-language en
# Disable specific checks
po-lint locale/ --disable untranslated fuzzy
Configuration
Add to your pyproject.toml:
[tool.po-lint]
# Explicit locale directories (relative to project root)
paths = ["locale"]
# Auto-discover locale dirs from installed Python packages
packages = ["myapp", "myotherapp"]
# Only check these languages (empty = all)
languages = []
# Source language — detections matching this are allowed (borrowed words)
source_language = "en"
# Minimum confidence to flag wrong language (0.0 - 1.0)
confidence_threshold = 0.5
# Minimum cleaned text length for language detection
min_detection_length = 30
# Skip entries with msgstr shorter than this
min_text_length = 3
# Use compact fastText model instead of full
compact_model = false
# Disable specific checks
# Valid: wrong_language, wrong_script, shifted_entry, garbled_text, untranslated, fuzzy, obsolete
disable = []
# Regex patterns to ignore (matched against msgid and msgstr)
ignore_patterns = []
Ignore file
Create a .po-lint-ignore file in your locale directory:
# Ignore for all languages
Some msgid that causes false positives
# Ignore only for specific languages
[ar,hi] Some msgid
# Ignore with specific msgctxt
screening status::Some msgid
# Both language scope and context
[ar] screening status::Some msgid
How it works
- Fuzzy entry check — flags entries marked as fuzzy that need review.
- Obsolete entry check — flags obsolete entries that should be removed.
- Untranslated entry check — flags entries with empty
msgstr. The source language is auto-detected (the locale where all entries are untranslated) or can be set explicitly. Skipped for the source language. - Wrong script check — fast, no model needed. Checks if the translation uses the expected writing system.
- Distinctive character check — detects cross-contamination between languages sharing a script (e.g. Russian/Ukrainian).
- Garbled text check — flags corrupted unicode.
- Shifted entry check — flags suspiciously short translations for long source strings.
- Wrong language check — uses fastText with three layers of false positive prevention:
- Confused language score merging — redistributes scores from commonly confused languages (e.g. Danish/Norwegian, Portuguese/Spanish)
- Source language allowance — borrowed words from the source language are common and allowed
- Carrier phrase confirmation — re-tests with a language-specific phrase prepended to distinguish false positives from real contamination
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_po_lint-0.2.0.tar.gz.
File metadata
- Download URL: python_po_lint-0.2.0.tar.gz
- Upload date:
- Size: 51.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85458b539f596f11e59ba1fbbf9cbe6b9e707623d9c5a0313b395feda7974f1d
|
|
| MD5 |
963c0f48b702534650514d48d919ff96
|
|
| BLAKE2b-256 |
84c45e73a12f090e17c8c6c45e2a4066f49f8687e1d36caa22147df2f8f914b6
|
Provenance
The following attestation bundles were made for python_po_lint-0.2.0.tar.gz:
Publisher:
ci.yml on pescheckit/python-po-lint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_po_lint-0.2.0.tar.gz -
Subject digest:
85458b539f596f11e59ba1fbbf9cbe6b9e707623d9c5a0313b395feda7974f1d - Sigstore transparency entry: 1122692814
- Sigstore integration time:
-
Permalink:
pescheckit/python-po-lint@10fc7e835a6b719e6cef5dcda1b050d3e5cab229 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/pescheckit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@10fc7e835a6b719e6cef5dcda1b050d3e5cab229 -
Trigger Event:
push
-
Statement type:
File details
Details for the file python_po_lint-0.2.0-py3-none-any.whl.
File metadata
- Download URL: python_po_lint-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c33ebe2092851875941da9d6ada512942094ab693ba0f1a3dc645e4bada984
|
|
| MD5 |
d41432f90d0679053ee0a2759f726c8e
|
|
| BLAKE2b-256 |
ce61a352c31e598372e9e5b0c0c5207d15ceac0d4bac758413bd2ea5168d92a0
|
Provenance
The following attestation bundles were made for python_po_lint-0.2.0-py3-none-any.whl:
Publisher:
ci.yml on pescheckit/python-po-lint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_po_lint-0.2.0-py3-none-any.whl -
Subject digest:
00c33ebe2092851875941da9d6ada512942094ab693ba0f1a3dc645e4bada984 - Sigstore transparency entry: 1122692824
- Sigstore integration time:
-
Permalink:
pescheckit/python-po-lint@10fc7e835a6b719e6cef5dcda1b050d3e5cab229 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/pescheckit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@10fc7e835a6b719e6cef5dcda1b050d3e5cab229 -
Trigger Event:
push
-
Statement type: