Skip to main content

A robust, human-in-the-loop framework for web novel translation

Project description

LexiconWeaver

A robust, human-in-the-loop framework for web novel translation with terminology consistency enforcement.

Overview

LexiconWeaver addresses the critical problem of Term Drift in machine translation of web novels (Xianxia, LitRPG, Fantasy). Traditional translators optimize for sentence-level fluency, ignoring novel-wide consistency. This results in proper nouns being translated inconsistently across chapters (e.g., "Spirit Severing" → "Spirit Cutting" in Chapter 1, "Soul Split" in Chapter 2).

The Solution

LexiconWeaver introduces a Middleware Layer that enforces terminology constraints before the AI generates text through a human-in-the-loop workflow:

  1. Scout: The machine identifies potential terms using heuristics
  2. Annotate: The human defines terms once
  3. Weave: The machine generates text strictly adhering to those definitions

Features

  • Intelligent Term Discovery: Heuristic-based Scout engine identifies potential terms using frequency, capitalization, and structural patterns
  • Human-in-the-Loop Workflow: Interactive TUI for efficient term annotation
  • Consistent Translation: Dynamic glossary injection ensures terminology consistency
  • Translation Caching: Avoid re-translating identical paragraphs
  • Dual Interface: Both CLI for automation and TUI for interactive use
  • Robust Error Handling: Graceful degradation and crash recovery

Installation

Prerequisites

  • Python 3.12+
  • Ollama installed and running with a language model

Install LexiconWeaver

pip install -e .

Or install with development dependencies:

pip install -e ".[dev]"

Download Spacy Model (Optional)

For enhanced POS tagging:

python -m spacy download en_core_web_sm

Quick Start

1. Configure LexiconWeaver

Create a configuration file (or use the template):

lexiconweaver config init
lexiconweaver config path

Or manually create ~/.config/lexiconweaver/config.toml:

[ollama]
url = "http://localhost:11434"
model = "llama2"
timeout = 300

[database]
path = ""  # Uses default location if empty

[scout]
min_confidence = 0.3
max_ngram_size = 4

2. Create a Project

lexiconweaver project create "My Novel"

3. Launch TUI

lexiconweaver tui chapter1.txt --project "My Novel"

4. Use CLI Commands

Discover terms:

lexiconweaver scout chapter1.txt --project "My Novel"

Translate:

lexiconweaver translate chapter1.txt --project "My Novel" --output translated.txt

Usage

TUI Interface

The TUI provides an interactive workspace:

  • Left Panel: Chapter text with highlighting
    • Green: Confirmed terms (already in glossary)
    • Yellow: Candidate terms (suggested by Scout)
  • Right Panel: Candidate queue sorted by confidence
  • Keybindings:
    • R: Run Scout to discover terms
    • Enter: Edit/Confirm selected candidate
    • Del: Ignore selected candidate
    • S: Skip candidate
    • Q: Quit

CLI Commands

# Config
lexiconweaver config path
lexiconweaver config init [--force]

# Project management
lexiconweaver project create <name>
lexiconweaver project list
lexiconweaver project select <name>
lexiconweaver project delete <name>

# Term discovery
lexiconweaver scout <file> [--project <name>] [--output <file>] [--min-confidence <0.0-1.0>]

# Translation
lexiconweaver translate <file> [--project <name>] [--output <file>]

# Launch TUI
lexiconweaver tui [<file>] [--project <name>]

Architecture

LexiconWeaver consists of three core engines:

  1. Scout (Discovery Engine): Heuristic-based term discovery with confidence scoring
  2. Annotator (Interaction Engine): Textual-based TUI for term management
  3. Weaver (Generation Engine): LLM translation with dynamic glossary injection

Data Flow

Raw Text → Scout → Candidate List → Annotator (User) → Glossary DB → Weaver → Translated Text

Configuration

Configuration can be provided via:

  1. TOML file: ~/.config/lexiconweaver/config.toml (Linux/Mac) or %APPDATA%/lexiconweaver/config.toml (Windows)
  2. Environment variables: Prefix with LEXICONWEAVER_ (e.g., LEXICONWEAVER_OLLAMA__URL=http://localhost:11434)

Config commands:

  • lexiconweaver config path — Show where config is read from
  • lexiconweaver config init — Write a template config file (use --force to overwrite)

See config/default.toml for all available options.

Development

Setup Development Environment

git clone <repository>
cd WeaveCodex
pip install -e ".[dev]"

Run Tests

pytest

With coverage:

pytest --cov=lexiconweaver --cov-report=html

Code Quality

# Linting
ruff check .

# Type checking
mypy src/lexiconweaver

# Formatting
black .

Project Structure

WeaveCodex/
├── src/lexiconweaver/     # Main package
│   ├── database/          # Database models and management
│   ├── engines/           # Scout and Weaver engines
│   ├── tui/               # Textual TUI interface
│   ├── cli/               # CLI commands
│   └── utils/             # Utility functions
├── tests/                 # Test suite
├── docs/                  # Documentation
└── config/                # Configuration templates

License

AGPLv3 License

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexiconweaver-0.1.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexiconweaver-0.1.0-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file lexiconweaver-0.1.0.tar.gz.

File metadata

  • Download URL: lexiconweaver-0.1.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lexiconweaver-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6753ad9e3ed39897e7da88760cedaa01955fe0da0e110e5589dc530e2af5b58c
MD5 1446d015f5900c9199b030b54cda2b0b
BLAKE2b-256 c760125cb83c32b56e597173944880582d40e82eb6b262888922f8a71cb31d73

See more details on using hashes here.

File details

Details for the file lexiconweaver-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lexiconweaver-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lexiconweaver-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5dc7fefda79c9169db8bf9f1b88438ae54dea5e9ccc6f2fc8d5f4dcc1c3e48ab
MD5 33da7c8464d9d8c52411b0c789d9256e
BLAKE2b-256 56795adb00fbfc9f851326ad83760ea6e3bbd8c5e91673a22abafbedea5c6091

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page