Skip to main content

A lightweight grammar inducer for toy lexicons

Project description

Grammarette

Grammarette is a lightweight grammar inducer for toy lexicons. Given a set of meaning–signal pairs (the lexicon), Grammarette produces a regex-like grammar that is as short as possible (in bits). This may be useful for analyzing the results of artificial language learning experiments, where the complexity of a lexicon may be quantified as the length of its shortest description. Grammarette is a work-in-progress and is not guaranteed to produce the shortest grammar. It uses multiple sequence alignment to align the signals and then writes grammars for all $2^{n-1}$ possible ways of partitioning the alignment. The shortest grammar that remains consistent with the observed lexicon is the grammarette.

Installation

Grammarette can be installed from the PyPI:

pip install grammarette

Grammarette has two dependencies, NumPy and SciPy, and is compatible with Python 3.8+.

Example

import grammarette

lexicon = {
    (0, 0): "buvikoe",
    (0, 1): "buvikoh",
    (0, 2): "buvikoe",
    (0, 3): "buvichoe",
    (1, 0): "zeteekoe",
    (1, 1): "zeteekoh",
    (1, 2): "zeteekoe",
    (1, 3): "zeteechoe",
    (2, 0): "gafykoe",
    (2, 1): "gafykoh",
    (2, 2): "gafykoe",
    (2, 3): "gaffychoe",
    (3, 0): "wopykoe",
    (3, 1): "wopykoh",
    (3, 2): "wopykoe",
    (3, 3): "wopychoe",
}

grmr = grammarette.induce(lexicon, dims=(4, 4))

print(grmr)
# Grammarette[0?buvi|1?zetee|23gaffy|2?gafy|3?wopy+?3ch|??k+?1oh|??oe]

print(grmr.grammar)
# 0?buvi|1?zetee|23gaffy|2?gafy|3?wopy+?3ch|??k+?1oh|??oe

print(grmr.codelength)
# 230.2261782820019

print(grmr.regex)
# ^((?P<kejeboxu0_>buvi)|(?P<hutusifo1_>zetee)|(?P<cedakesu23>gaffy)|(?P<coxycatu2_>gafy)|(?P<kusenewo3_>wopy))?((?P<byfipyxi_3>ch)|(?P<fujuvohy__>k))?((?P<nydepazy_1>oh)|(?P<wyvelesi__>oe))?$

print(grmr.produce( (2, 3) )) # use the induced grammar to produce a signal for meaning (2, 3)
# gaffychoe

print(grmr.comprehend( 'buvikoe' )) # use the induced grammar to infer meanings for "buvikoe"
# [(0, 0), (0, 1), (0, 2), (0, 3)] # this is currently incorrect, should be [(0, 0), (0, 2)]

Known issues

  • In some cases, a grammarette may not infer the correct set of meanings for a given input signal. This is partly an issue with the grammarette parser and partly an issue with the grammarette itself, which, in some cases, does not properly preserve all meaning information (the compression may be lossy).

  • Grammarette is not well tested with more than two dimensions and may not perform well in such cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grammarette-0.0.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

grammarette-0.0.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file grammarette-0.0.0.tar.gz.

File metadata

  • Download URL: grammarette-0.0.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for grammarette-0.0.0.tar.gz
Algorithm Hash digest
SHA256 56a8f89caf3b598275a0ddf98e286dc4824393e405bb5393a113c98509635bd5
MD5 70eb2cf97a21e3f292c7116aaed8e340
BLAKE2b-256 419cd7b7d4b5327447a89401161ad64e8c19e6f15ec1d87eb2e42c6a18312252

See more details on using hashes here.

File details

Details for the file grammarette-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: grammarette-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for grammarette-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49209d00b0d8d766512a770439d5d79f132a191c07e9bc775b25f5ec5c1115a9
MD5 91ffb62c2e335848783011f68f9c2693
BLAKE2b-256 15cd9e4ed46c1522b19c37235319016a4c45e02bf85f3d5c10c062ad3ca4182c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page