A lightweight grammar inducer for toy lexicons
Project description
Grammarette
Grammarette is a lightweight grammar inducer for toy lexicons. Given a set of meaning–signal pairs (the lexicon), Grammarette produces a regex-like grammar that is as short as possible (in bits). This may be useful for analyzing the results of artificial language learning experiments, where the complexity of a lexicon may be quantified as the length of its shortest description. Grammarette is a work-in-progress and is not guaranteed to produce the shortest grammar. It uses multiple sequence alignment to align the signals and then writes grammars for all $2^{n-1}$ possible ways of partitioning the alignment. The shortest grammar that remains consistent with the observed lexicon is the grammarette.
Installation
Grammarette can be installed from the PyPI:
pip install grammarette
Grammarette has two dependencies, NumPy and SciPy, and is compatible with Python 3.8+.
Example
import grammarette
lexicon = {
(0, 0): "buvikoe",
(0, 1): "buvikoh",
(0, 2): "buvikoe",
(0, 3): "buvichoe",
(1, 0): "zeteekoe",
(1, 1): "zeteekoh",
(1, 2): "zeteekoe",
(1, 3): "zeteechoe",
(2, 0): "gafykoe",
(2, 1): "gafykoh",
(2, 2): "gafykoe",
(2, 3): "gaffychoe",
(3, 0): "wopykoe",
(3, 1): "wopykoh",
(3, 2): "wopykoe",
(3, 3): "wopychoe",
}
grmr = grammarette.induce(lexicon, dims=(4, 4))
print(grmr)
# Grammarette[0?buvi|1?zetee|23gaffy|2?gafy|3?wopy+?3ch|??k+?1oh|??oe]
print(grmr.grammar)
# 0?buvi|1?zetee|23gaffy|2?gafy|3?wopy+?3ch|??k+?1oh|??oe
print(grmr.codelength)
# 230.2261782820019
print(grmr.regex)
# ^((?P<kejeboxu0_>buvi)|(?P<hutusifo1_>zetee)|(?P<cedakesu23>gaffy)|(?P<coxycatu2_>gafy)|(?P<kusenewo3_>wopy))?((?P<byfipyxi_3>ch)|(?P<fujuvohy__>k))?((?P<nydepazy_1>oh)|(?P<wyvelesi__>oe))?$
print(grmr.produce( (2, 3) )) # use the induced grammar to produce a signal for meaning (2, 3)
# gaffychoe
print(grmr.comprehend( 'buvikoe' )) # use the induced grammar to infer meanings for "buvikoe"
# [(0, 0), (0, 1), (0, 2), (0, 3)] # this is currently incorrect, should be [(0, 0), (0, 2)]
Known issues
-
In some cases, a grammarette may not infer the correct set of meanings for a given input signal. This is partly an issue with the grammarette parser and partly an issue with the grammarette itself, which, in some cases, does not properly preserve all meaning information (the compression may be lossy).
-
Grammarette is not well tested with more than two dimensions and may not perform well in such cases.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file grammarette-0.0.0.tar.gz
.
File metadata
- Download URL: grammarette-0.0.0.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56a8f89caf3b598275a0ddf98e286dc4824393e405bb5393a113c98509635bd5 |
|
MD5 | 70eb2cf97a21e3f292c7116aaed8e340 |
|
BLAKE2b-256 | 419cd7b7d4b5327447a89401161ad64e8c19e6f15ec1d87eb2e42c6a18312252 |
File details
Details for the file grammarette-0.0.0-py3-none-any.whl
.
File metadata
- Download URL: grammarette-0.0.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49209d00b0d8d766512a770439d5d79f132a191c07e9bc775b25f5ec5c1115a9 |
|
MD5 | 91ffb62c2e335848783011f68f9c2693 |
|
BLAKE2b-256 | 15cd9e4ed46c1522b19c37235319016a4c45e02bf85f3d5c10c062ad3ca4182c |