Basic tools for working with categorial grammars

These details have not been verified by PyPI

Project links

Project description

catgram

Basic tools for working with categorial grammars

This is a simple Python package providing some basic tools for working with categorial grammars. Development is on-again, off-again. Bug reports and feature requests are welcome—especially if it's for an item on the TODO list below, as you'd be providing extra motivation! :)

This package also includes a CCG dependency evaluation script that implements decomposed scoring as specified in Decomposed scoring of CCG dependencies. See below for examples. If you use decomposed scoring in your research, please cite the paper:

Aditya Bhargava and Gerald Penn. 2023. Decomposed scoring of CCG dependencies. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1030–1040, Toronto, Canada. Association for Computational Linguistics.

The script can also do the regular CCG dependency evaluation (examples).

In general, if you use this package in your research, please include a link to the GitHub repository, and remember to cite any appropriate research papers depending on your usage (e.g., for decomposed scoring as mentioned above; cite (Lewis and Steedman, 2014) if you use this package's implementation of their head-finding rules; etc.).

Requirements

Python 3.10+
lambda_calculus Python package (will be installed automatically by pip command below)

Installation

In your environment of choice:

$ pip3 install catgram

If you only want to run the evaluation script, you might want to consider using pipx to keep the installation isolated:

$ pipx install catgram
$ ccg_depeval -h

Or use it to run the script in a temporary environment:

$ pipx run --spec catgram ccg_depeval -h

Examples

Decomposed scoring

In additional to subcategorial labelling and alignment, decomposed scoring specifies the inclusion of root nodes. Most parsers do not explicitly specify these, and if they do, they must be extracted from heads specified in the .auto file (as far as I know, EasyCCG is the only parser that does this). The ccg_depeval script includes the facility for extracting root dependencies as necessary from parser .auto files (most statistical CCG parsers at least have the option to output these).

Usage is as follows:

$ ccg_depeval ground_truth_deps sys_deps ground_truth.auto sys.auto

where:

ground_truth_deps is the ground-truth dependencies, usually as produced by the parg2ccgbank_deps script from C&C (the actual filenames will be wsj00.ccgbank_deps for the dev set or wsj23.ccgbank_deps for the test set). The original PARG file format from CCGbank can also be used.
sys_deps is the dependencies predicted by a statistical parser, usually as produced by the generate program from C&C (I recommend using what's available in the Java version of C&C as it is updated compared to what's in the original C&C package).
ground_truth.auto is the ground-truth .auto file (e.g., straight from CCGbank). The heads specified in this file are followed directly according to the syntax specified in CCGbank.
sys.auto is the parse preidcted by the statistical parser. By default, the head-finding rules of Lewis and Steedman (2014) are followed to extract the root node.

Note: instead of .auto files, you can also provide root node information directly in a .roots file. The format is as produced by the ccg_roots script (examples).

A warning will be issued if there is no root available for a sentence, including if the last two arguments aren't specified. You can use the -r option to suppress this warning if you don't want to fuss with root nodes:

$ ccg_depeval -r ground_truth_deps sys_deps

This can be handy for, e.g., evaluating the Java version of the C&C parser, which doesn't produce a .auto file and instead produces a .deps file directly. Of course, omitting the root nodes will produce different scores. See (Bhargava and Penn, 2023) and (Bhargava, 2022, chapter 5) for examples of why you should include root nodes.

Other options of the script allow you to control whether subcategorial labelling and/or alignment are used as well, or to print per-sentence scores. See the script's help for full details:

$ ccg_depeval -h

Standard CCG scoring

For convenience, you can use the -s flag when running ccg_depeval to revert to the standard CCG scoring method:

$ ccg_depeval -s ground_truth_deps sys_deps

Extracting roots

This package also includes a standalone root-extraction script. For example:

$ ccg_roots -m ls14 sys.auto
will_8 S[dcl]
is_3 S[dcl]
...

It's important to use -m ls14 for a .auto file generated by most statistical parsers and -m autofile (which is the same as omitting the -m option) for a .auto file where the heads specified as per the .auto file syntax are indeed the desired heads. The latter case is applicable to CCGbank's .auto files but not those produced by most parsers, since they do not indicate the semantic heads. (EasyCCG is the biggest exception to this, and indeed, the ls14 rules are the same as used by that parser.) If you use the ls14 option for something in your research, make sure to cite Lewis and Steedman (2014) as Section 3.5 of that paper is where the rules were originally specified.

See the script help for full usage details:

$ ccg_roots -h

TODO

Tests
Ability to directly evaluate CCG .auto files
- This would re-implement the functionality of the generate program from C&C (or the more directly-integrated version of this process in Java C&C) so that new parsers wouldn't need to go back to C&C to do the evaluations
Examples for basic usage (for CategoryTree and TermGraph)
- For now, take a look at dependencies.py for examples of how to use CategoryTree. If examples are even slightly of interest to you, please submit a GitHub issue asking for them as doing so will help motivate me to add them!
Other tools that might be useful?
- Evaluation scripts (e.g., for evaluating statistical parsers)
- Visualization tools (CCG dependency graphs, LCG term graphs; outputs to SVG, LaTeX...)

License

Unless otherwise stated, all files in this package are subject to the below copyright and license. The main exception is candc_ignore.py, which is derived from the original C&C package and thus covered by the C&C System Licence Agreement. The code therein is reproduced with permission for inclusion in this package.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use the files in this repository except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 or in the LICENSE file in this repository.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Feb 17, 2024

0.2.0

Jun 25, 2023

0.1.0

May 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catgram-0.3.0.tar.gz (29.2 kB view details)

Uploaded Feb 17, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

catgram-0.3.0-py3-none-any.whl (30.2 kB view details)

Uploaded Feb 17, 2024 Python 3

File details

Details for the file catgram-0.3.0.tar.gz.

File metadata

Download URL: catgram-0.3.0.tar.gz
Upload date: Feb 17, 2024
Size: 29.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.6.10-1-MANJARO

File hashes

Hashes for catgram-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`4df9569870b25a5fe96eb2ed75f74cd6d382f3ca911c4cc22fda08f3fd68350b`
MD5	`e24c6dda02cefffc8b6f34665696bec8`
BLAKE2b-256	`875f55353e1e3ff1191500e0ce433042e8959b61d46dac74f3f70b30d5a19907`

See more details on using hashes here.

File details

Details for the file catgram-0.3.0-py3-none-any.whl.

File metadata

Download URL: catgram-0.3.0-py3-none-any.whl
Upload date: Feb 17, 2024
Size: 30.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.6.10-1-MANJARO

File hashes

Hashes for catgram-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b33b2624f4e03a7ebf457a75508bffd55872f88951acc0b97e091f28757e631`
MD5	`72bcc6d536695a48b385dcfd32336d0e`
BLAKE2b-256	`c347bee2eee7114d877feb7d5f71dcdc9ef05229fa8bb4d71e043bf0fc16641e`

See more details on using hashes here.

catgram 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

catgram

Requirements

Installation

Examples

Decomposed scoring

Standard CCG scoring

Extracting roots

TODO

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes