Terminology linter for a given subject area text
Project description
termlint
Terminology linter for projects — extracts terms from code/docs and verifies coverage against your glossary/ontology.
What Is termlint?
termlint is a CLI tool for terminology quality checks in text/documentation workflows.
- extracts term candidates from text
- verifies terms against your glossary (
exact/fuzzy) - generates JSON reports (
verification,ontology_update,quality_gate,extraction) - helps bootstrap and evolve glossaries (
glossary from-report,glossary merge)
Concept
Raw Text → Parallel Extractors → Async Pipeline → Glossary Match → Quality Report
↓ (rules,cvalue,keybert) (norm,filter,rank) ↓
TextEntityStream ────────────────────────→ Coverage 90%
Async functional pipeline with composable stages and universal TextEntity model.
Alpha Status
termlint is currently alpha.
Implemented and supported now:
- rule-based extraction (
RuleExtractor/ spaCy) - verification:
exact,fuzzy - report export: JSON (
extraction,verification,ontology_update,quality_gate) - glossary tooling:
glossary from-report,glossary merge
Planned / not implemented yet:
- extractors:
CValue,KeyBERT - processing stages:
filter,rank - verification stages:
semantic,ensemble - exporters: HTML, JUnit
Compatibility Matrix
| Dimension | Current support |
|---|---|
| OS | Linux, macOS, Windows (CLI, JSON workflows) |
| Python | 3.12.x |
| Required extras | termlint[base] |
| Core deps from extras | spacy, rapidfuzz |
| Default spaCy model | ru_core_news_sm |
| Console/output language | English-only CLI and report metadata |
| Tested text languages | Russian (ru_core_news_sm), English (en_core_web_sm) |
| Other languages | Possible via rules.model, but not yet validated in the alpha test matrix |
Language Support Policy
termlintpipeline is language-agnostic in design, but extraction quality depends on the selected spaCy model.- Officially tested in alpha:
- Russian with
ru_core_news_sm - English with
en_core_web_sm
- Russian with
- Other spaCy language models can be used via
[tool.termlint.extraction.rules].model, but should be treated as experimental until formally tested. - CLI messages and generated report metadata are in English.
Quick Start
- Install:
# Recommended for CLI usage (isolated global tool)
pipx install "termlint[base]"
# Alternative: project/venv install
pip install --pre "termlint[base]"
# Install spaCy model into the same environment
python -m spacy download en_core_web_sm
For pipx, install model inside the pipx environment:
pipx runpip termlint install en-core-web-sm
# or for Russian
pipx runpip termlint install ru-core-news-sm
- Create a minimal glossary (
glossary.json):
[
{ "id": "ml:001", "label": "machine learning", "synonyms": ["ML"] },
{ "id": "ml:002", "label": "artificial intelligence", "synonyms": ["AI"] }
]
- Create an input text file (
input.txt):
Artificial intelligence and machine learning are used in data analytics.
- Run verification:
termlint verify input.txt --source glossary.json --verifier fuzzy --threshold 85
- Expected output (example):
Files ... 100%
✅ input.txt ... 100%
📊 Coverage: 33.3% (2/6)
⚠️ Quality Gate would FAIL in CI mode
Generated reports:
reports/verification.jsonreports/ontology_update.jsonreports/quality_gate.json
Exit behavior:
verifytypically exits0on successful run (even if quality gate would fail in CI mode)ciexits1when quality gates fail- full contract is listed in Exit Codes
Glossary JSON Schema
termlint expects a glossary file as a JSON array of objects.
Required fields per entity:
id(string)label(string)
Optional fields:
synonyms(string[], default[])relations(object<string, string[]>, default{})definition(string | null)source(string | null)
Minimal valid example:
[
{
"id": "ml:001",
"label": "machine learning"
}
]
Extended example:
[
{
"id": "ml:001",
"label": "machine learning",
"synonyms": ["ML"],
"relations": {
"related_to": ["ml:002"]
},
"definition": "Field focused on learning patterns from data.",
"source": "internal-glossary"
}
]
Common validation/runtime errors:
- File not found:
Glossary file not found: <path> - Invalid JSON syntax:
Invalid JSON in <path>: ... - Invalid entity shape/type:
Failed to initialize glossary source '<path>': ...
Glossary Tooling
Create glossary from ontology_update report:
termlint glossary from-report \
--report reports/ontology_update.json \
--out glossary.generated.json \
--min-score 0.7 \
--min-frequency 1 \
--namespace auto
Merge generated glossary into an existing glossary:
termlint glossary merge \
--base glossary.json \
--updates glossary.generated.json \
--out glossary.merged.json \
--on-match merge-synonyms \
--on-conflict report \
--conflicts-out merge.conflicts.json \
--summary-out merge.summary.json
Development
poetry config virtualenvs.in-project true --local
poetry env use python3.12
poetry install --with dev --extras "base"
Logging
termlint follows common linter-style verbosity controls:
termlint -v verify <file> # INFO logs
termlint -vv verify <file> # DEBUG logs
termlint -q verify <file> # ERROR only
termlint --log-level DEBUG verify <file>
termlint --log-file reports/termlint.log verify <file>
termlint --config ./pyproject.toml verify <file> --source ./glossary.json
You can also set defaults in pyproject.toml:
[tool.termlint.logging]
level = "WARNING"
log_file = "reports/termlint.log"
fmt = "%(asctime)s [%(name)s] %(levelname)-8s %(message)s"
datefmt = "%Y-%m-%d %H:%M:%S"
max_bytes = 10485760
backup_count = 5
spaCy model download is disabled by default during lint runs. Configure extraction like:
[tool.termlint.extraction]
extractors = ["rule"]
rules = { model = "en_core_web_sm", auto_download_model = false }
Set auto_download_model = true only if you explicitly want runtime model download (not recommended for CI).
Config Discovery
Config lookup order:
--config <PATH>- nearest
pyproject.toml(searching upward from current directory), section[tool.termlint] - user-level config:
$XDG_CONFIG_HOME/termlint/config.toml(if set)~/.config/termlint/config.toml%APPDATA%/termlint/config.toml(Windows)~/.termlint/config.toml
- built-in defaults
User-level config may use either:
[tool.termlint](same as project config)[termlint](short form for standalone user config files)
Exit Codes
termlint uses a stable exit code contract:
0: successful run1: quality gate failed (cicommand)2: usage/configuration error (invalid options/config/source)3: internal pipeline/runtime error
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file termlint-0.1.0a2.tar.gz.
File metadata
- Download URL: termlint-0.1.0a2.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8b3208e7a84d63d51fb7d8ae0c5399981dbc825c8893d9a54d1ca83b7d6c461
|
|
| MD5 |
b15c6935d64d8dd87cc5abcd748c1ad0
|
|
| BLAKE2b-256 |
e62dd56a6682d65e1052ce359a153d15aca614cf5d64d7d7126997e278261b50
|
File details
Details for the file termlint-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: termlint-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
127cc92fc90a83aadd867dc2104acef4804cdb8f2c1052baf5caa11872ce615e
|
|
| MD5 |
ec1b7c9bc1da01c6b93660291f0b7655
|
|
| BLAKE2b-256 |
3998146513f57eaece23c1c6b6b24cedeca2f70f5e665fe38bca9253c9aaf3cf
|