Skip to main content

A type-safe, AI-powered framework for structured language translation

Project description

Yaduha

A framework for translating into low-resource and endangered languages using LLMs with grammatical constraints. Implements LLM-Assisted Rule-Based Machine Translation (LLM-RBMT) -- the LLM never needs to "know" the target language. Instead, it decomposes English input into structured forms that linguistic rules synthesize into the target language.

Based on the paper: LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages

Install

pip install yaduha          # core (pydantic only)
pip install yaduha[full]    # + LLM backends, API server, evaluation metrics

Language packages are installed separately:

pip install yaduha-ovp      # Owens Valley Paiute

Usage

from yaduha.agent.openai import OpenAIAgent
from yaduha.translator.pipeline import PipelineTranslator

translator = PipelineTranslator.from_language(
    "ovp",
    agent=OpenAIAgent(model="gpt-4o-mini", api_key="..."),
)

result = translator("The dog is sleeping.")
print(result.target)                    # target language output
print(result.back_translation.source)   # back-translation for verification

How it works

Sentence types are Pydantic models that encode a language's grammar. The LLM fills them via constrained decoding (structured output), guaranteeing every output is grammatically valid by construction. No parallel corpora required -- only a lexicon and grammatical rules.

English input
    -> LLM decomposes into structured Sentence models
    -> Sentence.__str__() renders target language text
    -> (optional) back-translate for verification

Translation strategies

Pipeline -- grammar-guaranteed via structured output. The LLM maps English into one or more Sentence subclasses; the __str__ method renders the target language. Output is always grammatically correct.

from yaduha.translator.pipeline import PipelineTranslator
translator = PipelineTranslator(agent=agent, SentenceType=(SVO, SV))

Agentic -- free-form with tool assistance. The LLM reasons freely and can call tools (dictionary lookup, pipeline translator, etc.) to produce a translation.

from yaduha.translator.agentic import AgenticTranslator
translator = AgenticTranslator(agent=agent, tools=[dictionary, pipeline])

Creating a language package

Define sentence types as Pydantic models and register via entrypoint:

from yaduha.language import Sentence

class SVSentence(Sentence):
    subject: Subject
    verb: Verb

    def __str__(self) -> str:
        return f"{self.subject.render()} {self.verb.render()}"

    @classmethod
    def get_examples(cls) -> list[tuple[str, "SVSentence"]]:
        return [("I sleep.", cls(subject=..., verb=...))]
# pyproject.toml
[project.entry-points."yaduha.languages"]
my_lang = "my_package:language"

See yaduha-ovp for a complete example.

LLM backends

  • OpenAI (yaduha.agent.openai)
  • Anthropic (yaduha.agent.anthropic)
  • Google Gemini (yaduha.agent.gemini)
  • Ollama (yaduha.agent.ollama)

Evaluation

Built-in evaluators for back-translation quality: chrF, BLEU, BERTScore, COMET, and OpenAI embedding similarity.

from yaduha.evaluator.chrf import ChrfEvaluator
from yaduha.evaluator import batch_evaluate

results = batch_evaluate(translations, ChrfEvaluator())

CLI

yaduha languages list              # list installed language packages
yaduha languages info ovp          # show language details
yaduha languages validate ovp      # validate a language implementation
yaduha serve                       # start FastAPI server + dashboard

Development

pip install yaduha[dev]
pytest tests/ -q
ruff check yaduha/ tests/
pyright yaduha/

Citation

@article{coleman2024llm,
  title={LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages},
  author={Coleman, Jared and Cuadros, Diego and Leeds, Nicholas and Krishnamachari, Bhaskar and Toal, Kira and Rosales, Ruben and Iskarous, Khalil},
  journal={arXiv preprint arXiv:2405.08997},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yaduha-0.3.2.tar.gz (54.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yaduha-0.3.2-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file yaduha-0.3.2.tar.gz.

File metadata

  • Download URL: yaduha-0.3.2.tar.gz
  • Upload date:
  • Size: 54.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yaduha-0.3.2.tar.gz
Algorithm Hash digest
SHA256 bdfd293894e09c61ada00b956bc50a3c5f5c66cabe6ef6900f5791ec10fe764e
MD5 d89c1c9237fa5d0cd2240e7af10b462d
BLAKE2b-256 45309abf56a0cde32f7da8d729f5f793e8b8419004a1938401e7c93be327da08

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaduha-0.3.2.tar.gz:

Publisher: publish.yml on kubishi/yaduha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yaduha-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: yaduha-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yaduha-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d482a79369aff59df3fc2b60b0526cecd71520b65be75e6af2bee49d2bd38401
MD5 ef314b1eee60f80264025b2cab021fbd
BLAKE2b-256 59c3e484cb7cfcd39f8ff836be89e85fb57dc16610ce8b1fa40a7d45e438f7b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaduha-0.3.2-py3-none-any.whl:

Publisher: publish.yml on kubishi/yaduha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page