Skip to main content

Utilities for lark's ambiguous trees

Project description

lark-ambig-tools

lark-ambig-tools is a collection of utilities for lark's ambiguous parse trees[^1].

[^1]: A Tree containing "_ambig" nodes often produced with the ambiguity="explicit" option.

Whether your ambiguous grammar is a bug or a feature, lark-ambig-tools helps you process ambiguity quickly and easily.

Features:

  • Count the total number of derivations in an ambiguous tree
  • Lazily iterate over the unambiguous derivations of a tree
  • Obtain all unambiguous trees faster and more efficiently than lark.visitors.CollapseAmbiguities

Requirements

  • Python 3.6+
  • lark 1.0+

Note: Only lark 1.0+ is officially supported, but lark-ambig-tools may work with older versions.

Installation

pip install lark-ambig-tools

Alternatively, include lark_ambig_tools.py in your Python project with a copy of the license.

Usage

CountedTree

CountedTree is a subclass of lark.Tree with an additional attribute: derivation_count. derivation_count contains the total number of unambiguous trees that are represented by the tree.

Examples

  1. Use CountedTree during parsing:
from lark import Lark
from lark_ambig_tools import CountedTree

parser = Lark(grammar, ambiguity="explicit", tree_class=CountedTree)
tree = parser.parse(text)
print(tree.derivation_count)
  1. Transform a Tree into a CountedTree:
from lark import Lark
from lark_ambig_tools import CountTrees

parser = Lark(grammar, ambiguity="explicit")
tree = parser.parse(text)
counted_tree = CountTrees().transform(tree)
print(counted_tree.derivation_count)

Note: It is not generally recommended to construct CountedTrees directly.

Disambiguator

Disambiguator is a lark.Interpreter that lazily iterates over the unambiguous trees represented by an ambiguous tree. It is a faster and more memory-efficient alternative to lark's CollapseAmbiguities. By providing trees in the same order trees as CollapseAmbiguities, Disambiguator is a drop-in replacement.

Example

from lark import Lark
from lark_ambig_tools import Disambiguator

parser = Lark(grammar, ambiguity="explicit")
ambig_tree = parser.parse(text)

disambiguator = Disambiguator()
for tree in disambiguator.visit(ambig_tree):
    # process unambiguous tree
    ...

Extra Lazy Disambiguator

When an instance of CountedTree is passed to Disambiguator.visit, Disambiguator takes advantage the known derivation counts to be even more lazy -- reducing computation and memory usage. Using Disambiguator with CountedTree is ideal when you do not need to iterate over all the trees. i.e. You stop iterating when you find one tree that meets your requirements. If you always need all the trees, most of the time it is better to pass a regular Tree.

For more insights into how to best use Disambiguator, see the benchmarks.

Benchmarks

Overview

benchmark.py contains benchmarks to test the performance of Disambiguator (with both Tree and CountedTree) and CollapseAmbiguities.

Tasks

The benchmarks cover two different use cases with the following tests:

  • Getting the first unambiguous tree
  • Getting all unambiguous trees

Trees

Each task is run with three different types of ambiguous trees:

  • A small tree that is neither deep nor of high degree (4 derivations)
  • A deep tree that is deep and of low degree (64 derivations)
  • A wide tree that is not deep and of a high degree (216 derivations)

Running

  1. Install the requirements:
pip install lark_ambig_tools[benchmark]
  1. Run the benchmarks:
pytest benchmark.py

Results

The following table summarizes some of the key metrics from one run of the benchmarks.

Name (time in us) Min Max Mean StdDev Median
test_disambiguator_counted_first[small] 13.6640 (1.0) 52.6320 (1.04) 14.2129 (1.0) 1.5089 (1.37) 14.0090 (1.0)
test_disambiguator_first[small] 15.5660 (1.14) 50.6990 (1.0) 16.1473 (1.14) 1.6425 (1.49) 15.9220 (1.14)
test_disambiguator_counted_first[wide] 21.0970 (1.54) 56.4830 (1.11) 21.7083 (1.53) 1.9241 (1.75) 21.4750 (1.53)
test_disambiguator_all[small] 28.4980 (2.09) 90.8440 (1.79) 29.1030 (2.05) 1.1015 (1.0) 28.9740 (2.07)
test_disambiguator_counted_all[small] 32.6300 (2.39) 95.0600 (1.87) 33.6988 (2.37) 2.9152 (2.65) 33.2920 (2.38)
test_collapse_ambiguities_all[small] 39.0030 (2.85) 97.3070 (1.92) 40.0052 (2.81) 2.9053 (2.64) 39.6080 (2.83)
test_disambiguator_counted_first[deep] 46.9520 (3.44) 120.7470 (2.38) 48.3013 (3.40) 4.5244 (4.11) 47.6360 (3.40)
test_disambiguator_first[wide] 83.5920 (6.12) 207.2310 (4.09) 86.4240 (6.08) 7.1662 (6.51) 85.4380 (6.10)
test_disambiguator_first[deep] 330.8410 (24.21) 11,565.8670 (228.13) 342.4290 (24.09) 236.3552 (214.58) 334.1530 (23.85)
test_disambiguator_counted_all[wide] 440.9970 (32.27) 1,000.8610 (19.74) 448.8748 (31.58) 25.9656 (23.57) 444.5130 (31.73)
test_disambiguator_all[wide] 643.2160 (47.07) 11,475.9880 (226.36) 728.5492 (51.26) 840.6795 (763.25) 650.9210 (46.46)
test_disambiguator_all[deep] 716.1010 (52.41) 12,895.9900 (254.36) 831.8742 (58.53) 966.1033 (877.12) 739.4460 (52.78)
test_collapse_ambiguities_all[deep] 924.8430 (67.68) 12,040.0150 (237.48) 1,019.8753 (71.76) 932.9566 (847.02) 931.7700 (66.51)
test_collapse_ambiguities_all[wide] 1,008.2290 (73.79) 12,064.4850 (237.96) 1,104.9380 (77.74) 910.6654 (826.78) 1,016.4430 (72.56)
test_disambiguator_counted_all[deep] 1,014.4010 (74.24) 12,216.7780 (240.97) 1,118.3364 (78.68) 952.4029 (864.68) 1,022.4965 (72.99)

Note: Getting the first tree from CollapseAmbiguities is the same as getting all the trees.

Insights

The following insights may be gathered from the above results:

  • Disambiguator with CountedTree is the fastest way to get the first tree.
  • Disambiguator with Tree beats CollapseAmbiguities in getting all trees for any tree type.
  • Deep ambiguous trees tend to require more computation than wide trees even when they have fewer total derivations.

Limitations

Of course, the performance of the different classes may vary depending on the hardware, environment, and workload. Furthermore, these benchmarks only test the runtime of the code. They do not take into account other relevant characteristics such as memory usage and performance with varying frequencies of requests.

However, hopefully these results offer a helpful starting point for using Disambiguator.

Testing

Run the tests with

python test_lark_ambig_tools.py

or with tox:

tox

License

This project is under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lark-ambig-tools-1.0.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

lark_ambig_tools-1.0.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file lark-ambig-tools-1.0.0.tar.gz.

File metadata

  • Download URL: lark-ambig-tools-1.0.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for lark-ambig-tools-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9b9cb31ccae15a2f8b4d6b662fcc3bc54527059362229825d119367294921078
MD5 747524252d4a86965996acb2511d7c0d
BLAKE2b-256 ea37923760c10bb9ff642684519ee12d69b2211f8e546a96bfc22ce7036c742e

See more details on using hashes here.

File details

Details for the file lark_ambig_tools-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lark_ambig_tools-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dbc98f34a48e9caa1c5a1eee4892886c34427f23bc82f64cd3efc3ebc48df8c2
MD5 62c4d2a63a66341c0f68b2d97d1d3c32
BLAKE2b-256 8833cac9919082ba62b2198391165d088680a71d4a872ef0eeba7b6f985e82d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page