A type-safe, AI-powered framework for structured language translation
Project description
Yaduha
A framework for translating into low-resource and endangered languages using LLMs with grammatical constraints. Implements LLM-Assisted Rule-Based Machine Translation (LLM-RBMT) -- the LLM never needs to "know" the target language. Instead, it decomposes English input into structured forms that linguistic rules synthesize into the target language.
Based on the paper: LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages
Install
pip install yaduha # core (pydantic only)
pip install yaduha[full] # + LLM backends, API server, evaluation metrics
Language packages are installed separately:
pip install yaduha-ovp # Owens Valley Paiute
Usage
from yaduha.agent.openai import OpenAIAgent
from yaduha.translator.pipeline import PipelineTranslator
translator = PipelineTranslator.from_language(
"ovp",
agent=OpenAIAgent(model="gpt-4o-mini", api_key="..."),
)
result = translator("The dog is sleeping.")
print(result.target) # target language output
print(result.back_translation.source) # back-translation for verification
How it works
Sentence types are Pydantic models that encode a language's grammar. The LLM fills them via constrained decoding (structured output), guaranteeing every output is grammatically valid by construction. No parallel corpora required -- only a lexicon and grammatical rules.
English input
-> LLM decomposes into structured Sentence models
-> Sentence.__str__() renders target language text
-> (optional) back-translate for verification
Translation strategies
Pipeline -- grammar-guaranteed via structured output. The LLM maps English into one or more Sentence subclasses; the __str__ method renders the target language. Output is always grammatically correct.
from yaduha.translator.pipeline import PipelineTranslator
translator = PipelineTranslator(agent=agent, SentenceType=(SVO, SV))
Agentic -- free-form with tool assistance. The LLM reasons freely and can call tools (dictionary lookup, pipeline translator, etc.) to produce a translation.
from yaduha.translator.agentic import AgenticTranslator
translator = AgenticTranslator(agent=agent, tools=[dictionary, pipeline])
Creating a language package
Define sentence types as Pydantic models and register via entrypoint:
from yaduha.language import Sentence
class SVSentence(Sentence):
subject: Subject
verb: Verb
def __str__(self) -> str:
return f"{self.subject.render()} {self.verb.render()}"
@classmethod
def get_examples(cls) -> list[tuple[str, "SVSentence"]]:
return [("I sleep.", cls(subject=..., verb=...))]
# pyproject.toml
[project.entry-points."yaduha.languages"]
my_lang = "my_package:language"
See yaduha-ovp for a complete example.
LLM backends
- OpenAI (
yaduha.agent.openai) - Anthropic (
yaduha.agent.anthropic) - Google Gemini (
yaduha.agent.gemini) - Ollama (
yaduha.agent.ollama)
Evaluation
Built-in evaluators for back-translation quality: chrF, BLEU, BERTScore, COMET, and OpenAI embedding similarity.
from yaduha.evaluator.chrf import ChrfEvaluator
from yaduha.evaluator import batch_evaluate
results = batch_evaluate(translations, ChrfEvaluator())
CLI
yaduha languages list # list installed language packages
yaduha languages info ovp # show language details
yaduha languages validate ovp # validate a language implementation
yaduha serve # start FastAPI server + dashboard
Development
pip install yaduha[dev]
pytest tests/ -q
ruff check yaduha/ tests/
pyright yaduha/
Citation
@article{coleman2024llm,
title={LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages},
author={Coleman, Jared and Cuadros, Diego and Leeds, Nicholas and Krishnamachari, Bhaskar and Toal, Kira and Rosales, Ruben and Iskarous, Khalil},
journal={arXiv preprint arXiv:2405.08997},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yaduha-0.3.2.tar.gz.
File metadata
- Download URL: yaduha-0.3.2.tar.gz
- Upload date:
- Size: 54.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdfd293894e09c61ada00b956bc50a3c5f5c66cabe6ef6900f5791ec10fe764e
|
|
| MD5 |
d89c1c9237fa5d0cd2240e7af10b462d
|
|
| BLAKE2b-256 |
45309abf56a0cde32f7da8d729f5f793e8b8419004a1938401e7c93be327da08
|
Provenance
The following attestation bundles were made for yaduha-0.3.2.tar.gz:
Publisher:
publish.yml on kubishi/yaduha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yaduha-0.3.2.tar.gz -
Subject digest:
bdfd293894e09c61ada00b956bc50a3c5f5c66cabe6ef6900f5791ec10fe764e - Sigstore transparency entry: 1006784896
- Sigstore integration time:
-
Permalink:
kubishi/yaduha@609cadba94f15bf792387d57e12051b3c4c6d426 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/kubishi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@609cadba94f15bf792387d57e12051b3c4c6d426 -
Trigger Event:
release
-
Statement type:
File details
Details for the file yaduha-0.3.2-py3-none-any.whl.
File metadata
- Download URL: yaduha-0.3.2-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d482a79369aff59df3fc2b60b0526cecd71520b65be75e6af2bee49d2bd38401
|
|
| MD5 |
ef314b1eee60f80264025b2cab021fbd
|
|
| BLAKE2b-256 |
59c3e484cb7cfcd39f8ff836be89e85fb57dc16610ce8b1fa40a7d45e438f7b3
|
Provenance
The following attestation bundles were made for yaduha-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on kubishi/yaduha
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yaduha-0.3.2-py3-none-any.whl -
Subject digest:
d482a79369aff59df3fc2b60b0526cecd71520b65be75e6af2bee49d2bd38401 - Sigstore transparency entry: 1006784930
- Sigstore integration time:
-
Permalink:
kubishi/yaduha@609cadba94f15bf792387d57e12051b3c4c6d426 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/kubishi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@609cadba94f15bf792387d57e12051b3c4c6d426 -
Trigger Event:
release
-
Statement type: