Skip to main content

A type-safe, AI-powered framework for structured language translation

Project description

Yaduha

A framework for translating into low-resource and endangered languages using LLMs with grammatical constraints. Implements LLM-Assisted Rule-Based Machine Translation (LLM-RBMT) -- the LLM never needs to "know" the target language. Instead, it decomposes English input into structured forms that linguistic rules synthesize into the target language.

Based on the paper: LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages

Install

pip install yaduha          # core (pydantic only)
pip install yaduha[full]    # + LLM backends, API server, evaluation metrics

Language packages are installed separately:

pip install yaduha-ovp      # Owens Valley Paiute

Usage

from yaduha.agent.openai import OpenAIAgent
from yaduha.translator.pipeline import PipelineTranslator

translator = PipelineTranslator.from_language(
    "ovp",
    agent=OpenAIAgent(model="gpt-4o-mini", api_key="..."),
)

result = translator("The dog is sleeping.")
print(result.target)                    # target language output
print(result.back_translation.source)   # back-translation for verification

How it works

Sentence types are Pydantic models that encode a language's grammar. The LLM fills them via constrained decoding (structured output), guaranteeing every output is grammatically valid by construction. No parallel corpora required -- only a lexicon and grammatical rules.

English input
    -> LLM decomposes into structured Sentence models
    -> Sentence.__str__() renders target language text
    -> (optional) back-translate for verification

Translation strategies

Pipeline -- grammar-guaranteed via structured output. The LLM maps English into one or more Sentence subclasses; the __str__ method renders the target language. Output is always grammatically correct.

from yaduha.translator.pipeline import PipelineTranslator
translator = PipelineTranslator(agent=agent, SentenceType=(SVO, SV))

Agentic -- free-form with tool assistance. The LLM reasons freely and can call tools (dictionary lookup, pipeline translator, etc.) to produce a translation.

from yaduha.translator.agentic import AgenticTranslator
translator = AgenticTranslator(agent=agent, tools=[dictionary, pipeline])

Creating a language package

Define sentence types as Pydantic models and register via entrypoint:

from yaduha.language import Sentence

class SVSentence(Sentence):
    subject: Subject
    verb: Verb

    def __str__(self) -> str:
        return f"{self.subject.render()} {self.verb.render()}"

    @classmethod
    def get_examples(cls) -> list[tuple[str, "SVSentence"]]:
        return [("I sleep.", cls(subject=..., verb=...))]
# pyproject.toml
[project.entry-points."yaduha.languages"]
my_lang = "my_package:language"

See yaduha-ovp for a complete example.

LLM backends

  • OpenAI (yaduha.agent.openai)
  • Anthropic (yaduha.agent.anthropic)
  • Google Gemini (yaduha.agent.gemini)
  • Ollama (yaduha.agent.ollama)

Evaluation

Built-in evaluators for back-translation quality: chrF, BLEU, BERTScore, COMET, and OpenAI embedding similarity.

from yaduha.evaluator.chrf import ChrfEvaluator
from yaduha.evaluator import batch_evaluate

results = batch_evaluate(translations, ChrfEvaluator())

CLI

yaduha languages list              # list installed language packages
yaduha languages info ovp          # show language details
yaduha languages validate ovp      # validate a language implementation
yaduha serve                       # start FastAPI server + dashboard

Development

pip install yaduha[dev]
pytest tests/ -q
ruff check yaduha/ tests/
pyright yaduha/

Citation

@article{coleman2024llm,
  title={LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages},
  author={Coleman, Jared and Cuadros, Diego and Leeds, Nicholas and Krishnamachari, Bhaskar and Toal, Kira and Rosales, Ruben and Iskarous, Khalil},
  journal={arXiv preprint arXiv:2405.08997},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yaduha-0.3.4.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yaduha-0.3.4-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file yaduha-0.3.4.tar.gz.

File metadata

  • Download URL: yaduha-0.3.4.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yaduha-0.3.4.tar.gz
Algorithm Hash digest
SHA256 fd749c1db031146a9d1f8ab25a5c805cf0f67a7d25a62bf3aa773005d2f1f501
MD5 cc1c9ae2ec2ed272cb706001259eb3b2
BLAKE2b-256 0f39a430c192b5e35056b6f545eb4ef8cb9c8e9885ac855791db6efbc66f581e

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaduha-0.3.4.tar.gz:

Publisher: publish.yml on kubishi/yaduha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yaduha-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: yaduha-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 53.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yaduha-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 958f940f00f77993c0948ac6332f241d4a63c660428d8e50a4cdbd04022c4a75
MD5 a6cef1d45116c6863747250bf6a53e74
BLAKE2b-256 e41cc8f99267e59d37547086f49a206632def7cec3db2ddb9c1405bd18e274f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaduha-0.3.4-py3-none-any.whl:

Publisher: publish.yml on kubishi/yaduha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page