Skip to main content

Library that allows you to define structured utterances

Project description

Table of contents

grammatron

grammatron is a library for producing natural-language utterances from templates. Instead of raw f-strings, templates use typed dubs (dubbing units) that convert values into spoken-style text. Grammatron ensures the grammatical coherence of the generated text in the supported languages, which are English, German and Russian.

Templates and Utterances

A Template is built from an f-string that embeds VariableDub placeholders. CardinalDub converts integers to their spoken form:

from grammatron import Template, VariableDub, CardinalDub, DubParameters

AMOUNT = VariableDub("amount", CardinalDub())

template = Template(f"You've got {AMOUNT} messages")

Calling template.utter(value) (or the shorthand template(value)) produces an Utterance. Call .to_str() to get the final string:

result = template.utter(2).to_str()

The result will be:

"You've got two messages"

Pass DubParameters(spoken=False) to get the numeric form instead:

result = template(3).to_str(DubParameters(spoken=False))

result will be:

"You've got 3 messages"

Multiple variables

When a template has several dubs, pass assignments explicitly:

UNREAD = VariableDub("unread", CardinalDub())

template = Template(f"You've got {AMOUNT} messages, {UNREAD} unread")
result = template(AMOUNT.assign(2), UNREAD.assign(1)).to_str()

The result will be:

"You've got two messages, one unread"

Keyword arguments are also supported:

result = template(amount=2, unread=1).to_str()

The result will be:

"You've got two messages, one unread"

Typed templates

Bind the template to a dataclass with .with_type() so a single object can be passed:

from dataclasses import dataclass

@dataclass
class Data:
    amount: int
    unread: int

typed_template = Template(f"You've got {AMOUNT} messages, {UNREAD} unread").with_type(Data)

result = typed_template(Data(amount=2, unread=1)).to_str()

The result will be:

"You've got two messages, one unread"

Dubs

A dub describes how to convert a value of certain type to a spoken and written string. Several built-in dubs are available.

OptionsDub

OptionsDub maps enum members (or plain strings) to their human-readable forms. Enum names with underscores are automatically converted to space-separated words:

from grammatron import VariableDub, Template, OptionsDub
from enum import Enum

class PaymentMethod(Enum):
    credit_card = 0
    paypal = "Paypal"

METHOD = VariableDub("method", OptionsDub(PaymentMethod))
template = Template(f"Your payment method is {METHOD}")

credit_card = template.to_str(PaymentMethod.credit_card)
paypal = template.to_str(PaymentMethod.paypal)

credit_card will be:

'Your payment method is credit card'

And for paypal:

'Your payment method is Paypal'

DateDub

DateDub converts a datetime to its spoken form. Use .as_variable(name) as a shorthand for wrapping in a VariableDub:

import datetime
from grammatron import DateDub

template = Template(f"Today is {DateDub().as_variable('date')}")
result = template.to_str(datetime.datetime(2015, 1, 1))

result will be:

'Today is January, first, fifteenth'

TimedeltaDub

from grammatron import TimedeltaDub

template = Template(f"Now is {TimedeltaDub().as_variable('time')}")
result = template.to_str(datetime.timedelta(hours=15, minutes = 23))

result will be:

'Now is fifteen hours and twenty-three minutes'

Grammar

PluralAgreement

PluralAgreement wraps a numeric dub and a noun, producing the correctly inflected form:

from grammatron import Template, VariableDub, CardinalDub, PluralAgreement

AMOUNT = VariableDub("amount", CardinalDub())
template = Template(f"You've got {PluralAgreement(AMOUNT, 'message')}")

result = template(1).to_str()

result will be:

"You've got one message"
result = template(2).to_str()

And for 2:

"You've got two messages"

The noun argument can itself be a dub — the plural form of the chosen option is then used:

from grammatron import OptionsDub

ITEM = VariableDub("item", OptionsDub(['car', 'bike', 'truck']))
template = Template(f"You've ordered {PluralAgreement(AMOUNT, ITEM)}")

cars = template.utter(AMOUNT.assign(2), ITEM.assign('car')).to_str()
bikes = template.utter(AMOUNT.assign(1), ITEM.assign('bike')).to_str()

cars will be:

"You've ordered two cars"

And for one bike:

"You've ordered one bike"

I will also work in fairly complicated cases:

AMOUNT = VariableDub("amount", CardinalDub())
template = Template(f"You've ordered {PluralAgreement(AMOUNT, 'big glass of juice')}")
result = template(2).to_str()

The result will be:

"You've ordered two big glasses of juice"

However, this is not perfect:

template = Template(f"You've ordered {PluralAgreement(AMOUNT, 'small glass of juice')}")
result = template(2).to_str()

The result will be:

"You've ordered two smalls glass of juice"

This happens because "small" can be used as the noun in English.

German and Russian support

In germanic and slavic languages, this is more complicated:

English German Russian
I carry one small suitcase Ich trage einen kleinen Koffer Я несу один маленький чемодан
I travel with one small suitcase Ich fahre mit einem kleinen Koffer Я еду с одним маленьким чемоданом
I carry one small bag Ich trage eine kleine Tasche Я несу одну маленькую сумку
I travel with one small bag Ich fahre mit einer kleinen Tasche Я еду с одной маленькой сумкой
I carry two small suitcases Ich trage zwei kleine Koffer Я несу два маленьких чемодана
I travel with two small suitcases Ich fahre mit zwei kleinen Koffern Я еду с двумя маленькими чемоданами
I carry two small bags Ich trage zwei kleine Taschen Я несу две маленькие сумки
I traver with two small bags Ich fahre mit zwei kleinen Taschen Я еду с двумя маленькими сумками

So, to put correctly together a number and "kleine Tasche/klein Koffer" or "маленькая сумка/маленький чемодан", one must take into account:

  • The case of the noun group ("tragen" and "нести" requires accusative, "fahre mit" - dative, "ехать с" - prepositional case)
  • The genus of the noun (Koffer and чемодан are mascular, Tasche and сумка are feminar, neutrum is also possible)
  • The value of the numeral

These parameters determine the form of noun, adjective and the numeral. And this is not something you may ignore: natives see these errors instantly, and if the grammar of the phrases is wrong, their impression of the product immediately and deeply declines.

grammatron takes care of that too (but may fail in some cases, as it was with English):

from grammatron import CardinalDub, PluralAgreement, OptionsDub, Template, DubParameters
from grammatron.grammars.de import DeCasus
from enum import Enum

class GermanOptions(Enum):
    bag = "kleine Tasche"
    suitcase = "klein Koffer"

AMOUNT = CardinalDub().as_variable("amount")
OBJECT = OptionsDub(GermanOptions).as_variable("object")
carry_template = Template(f"Ich trage {PluralAgreement(AMOUNT, OBJECT).grammar.de(casus=DeCasus.AKKUSATIV)}")
travel_template = Template(f"Ich fahre mit {PluralAgreement(AMOUNT, OBJECT).grammar.de(casus=DeCasus.DATIV)}")

results = [
    template.utter(amount = amount, object = object).to_str(DubParameters(language='de'))
    for amount in [1,2]
    for object in [GermanOptions.suitcase, GermanOptions.bag]
    for template in [carry_template, travel_template]
]

The results will be, as in the table above:

['Ich trage einen kleinen Koffer',
 'Ich fahre mit einem kleinen Koffer',
 'Ich trage eine kleine Tasche',
 'Ich fahre mit einer kleinen Tasche',
 'Ich trage zwei kleine Koffer',
 'Ich fahre mit zwei kleinen Koffern',
 'Ich trage zwei kleine Taschen',
 'Ich fahre mit zwei kleinen Taschen']

For russian:

from grammatron.grammars.ru import RuCase

class RussianOptions(Enum):
    bag = "маленькая сумка"
    suitcase = "маленький чемодан"


AMOUNT = CardinalDub().as_variable("amount")
OBJECT = OptionsDub(RussianOptions).as_variable("object")
carry_template = Template(f"Я несу {PluralAgreement(AMOUNT, OBJECT).grammar.ru(case=RuCase.ACCUSATIVE)}")
travel_template = Template(f"Я еду с {PluralAgreement(AMOUNT, OBJECT).grammar.ru(case=RuCase.INSTRUMENTAL)}")

results = [
    template.utter(amount = amount, object = object).to_str(DubParameters(language='ru'))
    for amount in [1,2]
    for object in [RussianOptions.suitcase, RussianOptions.bag]
    for template in [carry_template, travel_template]
]

print(results)

Multi-language templates

It is possible to declare multi-language templates:

from grammatron import TimedeltaDub
TIME = TimedeltaDub().as_variable("time")
template = Template(
    f"It is {TIME}",
    de = f"Es ist {TIME}",
    ru = f"Сейчас {TIME}"
)
results = [
    template.utter(datetime.timedelta(hours=3,minutes=45)).to_str(DubParameters(language=language))
    for language in ['en', 'de', 'ru']
]

The results array is:

['It is three hours and forty-five minutes',
 'Es ist drei Stunden und fünfundvierzig Minuten',
 'Сейчас три часа и сорок пять минут']

Currently, this path is not well-explored, e.g., it's not compatible with OptionsDub. Kaia evolved more in the direction where the LLM produces the translations to the different languages instead of programmers writing them manually, so this branch of multi-language templates is unlikely to be resumed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaia_grammatron-4.9.91.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaia_grammatron-4.9.91-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file kaia_grammatron-4.9.91.tar.gz.

File metadata

  • Download URL: kaia_grammatron-4.9.91.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kaia_grammatron-4.9.91.tar.gz
Algorithm Hash digest
SHA256 551cc282d7b082c9e0a9d6d421c5fc9d89a72f278baa152d9b4eefaaf3fdaf5f
MD5 b9ce746105922139519236064c2cffc0
BLAKE2b-256 184ca80b375d58a5928528f3cf91ce38685cb993a53b7f532936ca779a01cc6f

See more details on using hashes here.

File details

Details for the file kaia_grammatron-4.9.91-py3-none-any.whl.

File metadata

File hashes

Hashes for kaia_grammatron-4.9.91-py3-none-any.whl
Algorithm Hash digest
SHA256 583582612e19032ce3b0c006c8d5c3682e04612cca1ecfd94880ea1181d0b0fe
MD5 4a95539900c9c991859e7cd147d8d39b
BLAKE2b-256 37b77b6fef7cb1d2d42bfc0a6032ad18fc72c50c5453ca9d2e2cc188152eec45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page