A robust, human-in-the-loop framework for web novel translation
Project description
LexiconWeaver
A robust, human-in-the-loop framework for web novel translation with terminology consistency enforcement.
Overview
LexiconWeaver addresses the critical problem of Term Drift in machine translation of web novels (Xianxia, LitRPG, Fantasy). Traditional translators optimize for sentence-level fluency, ignoring novel-wide consistency. This results in proper nouns being translated inconsistently across chapters (e.g., "Spirit Severing" → "Spirit Cutting" in Chapter 1, "Soul Split" in Chapter 2).
The Solution
LexiconWeaver introduces a Middleware Layer that enforces terminology constraints before the AI generates text through a human-in-the-loop workflow:
- Scout: The machine identifies potential terms using heuristics
- Annotate: The human defines terms once
- Weave: The machine generates text strictly adhering to those definitions
Features
- Intelligent Term Discovery: Heuristic-based Scout engine identifies potential terms using frequency, capitalization, and structural patterns
- Human-in-the-Loop Workflow: Interactive TUI for efficient term annotation
- Consistent Translation: Dynamic glossary injection ensures terminology consistency
- Translation Caching: Avoid re-translating identical paragraphs
- Dual Interface: Both CLI for automation and TUI for interactive use
- Robust Error Handling: Graceful degradation and crash recovery
Installation
Prerequisites
- Python 3.12+
- Ollama installed and running with a language model
Install LexiconWeaver
pip install -e .
Or install with development dependencies:
pip install -e ".[dev]"
Download Spacy Model (Optional)
For enhanced POS tagging:
python -m spacy download en_core_web_sm
Quick Start
1. Configure LexiconWeaver
Create a configuration file (or use the template):
lexiconweaver config init
lexiconweaver config path
Or manually create ~/.config/lexiconweaver/config.toml:
[ollama]
url = "http://localhost:11434"
model = "llama2"
timeout = 300
[database]
path = "" # Uses default location if empty
[scout]
min_confidence = 0.3
max_ngram_size = 4
2. Create a Project
lexiconweaver project create "My Novel"
3. Launch TUI
lexiconweaver tui chapter1.txt --project "My Novel"
4. Use CLI Commands
Discover terms:
lexiconweaver scout chapter1.txt --project "My Novel"
Translate:
lexiconweaver translate chapter1.txt --project "My Novel" --output translated.txt
Usage
TUI Interface
The TUI provides an interactive workspace:
- Left Panel: Chapter text with highlighting
- Green: Confirmed terms (already in glossary)
- Yellow: Candidate terms (suggested by Scout)
- Right Panel: Candidate queue sorted by confidence
- Keybindings:
R: Run Scout to discover termsEnter: Edit/Confirm selected candidateDel: Ignore selected candidateS: Skip candidateQ: Quit
CLI Commands
# Config
lexiconweaver config path
lexiconweaver config init [--force]
# Project management
lexiconweaver project create <name>
lexiconweaver project list
lexiconweaver project select <name>
lexiconweaver project delete <name>
# Term discovery
lexiconweaver scout <file> [--project <name>] [--output <file>] [--min-confidence <0.0-1.0>]
# Translation
lexiconweaver translate <file> [--project <name>] [--output <file>]
# Launch TUI
lexiconweaver tui [<file>] [--project <name>]
Architecture
LexiconWeaver consists of three core engines:
- Scout (Discovery Engine): Heuristic-based term discovery with confidence scoring
- Annotator (Interaction Engine): Textual-based TUI for term management
- Weaver (Generation Engine): LLM translation with dynamic glossary injection
Data Flow
Raw Text → Scout → Candidate List → Annotator (User) → Glossary DB → Weaver → Translated Text
Configuration
Configuration can be provided via:
- TOML file:
~/.config/lexiconweaver/config.toml(Linux/Mac) or%APPDATA%/lexiconweaver/config.toml(Windows) - Environment variables: Prefix with
LEXICONWEAVER_(e.g.,LEXICONWEAVER_OLLAMA__URL=http://localhost:11434)
Config commands:
lexiconweaver config path— Show where config is read fromlexiconweaver config init— Write a template config file (use--forceto overwrite)
See config/default.toml for all available options.
Development
Setup Development Environment
git clone <repository>
cd WeaveCodex
pip install -e ".[dev]"
Run Tests
pytest
With coverage:
pytest --cov=lexiconweaver --cov-report=html
Code Quality
# Linting
ruff check .
# Type checking
mypy src/lexiconweaver
# Formatting
black .
Project Structure
WeaveCodex/
├── src/lexiconweaver/ # Main package
│ ├── database/ # Database models and management
│ ├── engines/ # Scout and Weaver engines
│ ├── tui/ # Textual TUI interface
│ ├── cli/ # CLI commands
│ └── utils/ # Utility functions
├── tests/ # Test suite
├── docs/ # Documentation
└── config/ # Configuration templates
License
AGPLv3 License
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lexiconweaver-0.1.0.tar.gz.
File metadata
- Download URL: lexiconweaver-0.1.0.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6753ad9e3ed39897e7da88760cedaa01955fe0da0e110e5589dc530e2af5b58c
|
|
| MD5 |
1446d015f5900c9199b030b54cda2b0b
|
|
| BLAKE2b-256 |
c760125cb83c32b56e597173944880582d40e82eb6b262888922f8a71cb31d73
|
File details
Details for the file lexiconweaver-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lexiconweaver-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dc7fefda79c9169db8bf9f1b88438ae54dea5e9ccc6f2fc8d5f4dcc1c3e48ab
|
|
| MD5 |
33da7c8464d9d8c52411b0c789d9256e
|
|
| BLAKE2b-256 |
56795adb00fbfc9f851326ad83760ea6e3bbd8c5e91673a22abafbedea5c6091
|