Basic tools for working with categorial grammars
Project description
catgram
Basic tools for working with categorial grammars
This is a simple Python package providing some basic tools for working with categorial grammars. Development is on-again, off-again. Bug reports and feature requests are welcome—especially if it's for an item on the TODO list below, as you'd be providing extra motivation! :)
This package also includes a CCG dependency evaluation script that implements decomposed scoring as specified in Decomposed scoring of CCG dependencies. See below for examples. If you use decomposed scoring in your research, please cite the paper:
- Aditya Bhargava and Gerald Penn. 2023. Decomposed scoring of CCG dependencies. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1030–1040, Toronto, Canada. Association for Computational Linguistics.
The script can also do the regular CCG dependency evaluation (examples).
In general, if you use this package in your research, please include a link to the GitHub repository, and remember to cite any appropriate research papers depending on your usage (e.g., for decomposed scoring as mentioned above; cite (Lewis and Steedman, 2014) if you use this package's implementation of their head-finding rules; etc.).
Requirements
- Python 3.10+
lambda_calculusPython package (will be installed automatically bypipcommand below)
Installation
In your environment of choice:
$ pip3 install catgram
If you only want to run the evaluation script, you might want to consider using
pipx to keep the installation isolated:
$ pipx install catgram
$ ccg_depeval -h
Or use it to run the script in a temporary environment:
$ pipx run --spec catgram ccg_depeval -h
Examples
Decomposed scoring
In additional to subcategorial labelling and alignment, decomposed scoring
specifies the inclusion of root nodes.
Most parsers do not explicitly specify these, and if they do, they must be
extracted from heads specified in the .auto file (as far as I know, EasyCCG
is the only parser that does this).
The ccg_depeval script includes the facility for extracting root dependencies
as necessary from parser .auto files (most statistical CCG parsers at least
have the option to output these).
Usage is as follows:
$ ccg_depeval ground_truth_deps sys_deps ground_truth.auto sys.auto
where:
ground_truth_depsis the ground-truth dependencies, usually as produced by theparg2ccgbank_depsscript from C&C (the actual filenames will bewsj00.ccgbank_depsfor the dev set orwsj23.ccgbank_depsfor the test set). The originalPARGfile format from CCGbank can also be used.sys_depsis the dependencies predicted by a statistical parser, usually as produced by thegenerateprogram from C&C (I recommend using what's available in the Java version of C&C as it is updated compared to what's in the original C&C package).ground_truth.autois the ground-truth.autofile (e.g., straight from CCGbank). The heads specified in this file are followed directly according to the syntax specified in CCGbank.sys.autois the parse preidcted by the statistical parser. By default, the head-finding rules of Lewis and Steedman (2014) are followed to extract the root node.
Note: instead of .auto files, you can also provide root node information
directly in a .roots file.
The format is as produced by the ccg_roots script
(examples).
A warning will be issued if there is no root available for a sentence, including
if the last two arguments aren't specified.
You can use the -r option to suppress this warning if you don't want to fuss
with root nodes:
$ ccg_depeval -r ground_truth_deps sys_deps
This can be handy for, e.g., evaluating the Java version of the C&C parser,
which doesn't produce a .auto file and instead produces a .deps file
directly.
Of course, omitting the root nodes will produce different scores.
See (Bhargava and Penn, 2023)
and (Bhargava, 2022, chapter 5) for
examples of why you should include root nodes.
Other options of the script allow you to control whether subcategorial labelling and/or alignment are used as well, or to print per-sentence scores. See the script's help for full details:
$ ccg_depeval -h
Standard CCG scoring
For convenience, you can use the -s flag when running ccg_depeval to revert
to the standard CCG scoring method:
$ ccg_depeval -s ground_truth_deps sys_deps
Extracting roots
This package also includes a standalone root-extraction script. For example:
$ ccg_roots -m ls14 sys.auto
will_8 S[dcl]
is_3 S[dcl]
...
It's important to use -m ls14 for a .auto file generated by most statistical
parsers and -m autofile (which is the same as omitting the -m option) for
a .auto file where the heads specified as per the .auto file syntax are
indeed the desired heads.
The latter case is applicable to CCGbank's .auto files but not those produced
by most parsers, since they do not indicate the semantic heads.
(EasyCCG is the biggest exception to this, and indeed, the ls14 rules are the
same as used by that parser.)
If you use the ls14 option for something in your research, make sure to cite
Lewis and Steedman (2014) as Section 3.5
of that paper is where the rules were originally specified.
See the script help for full usage details:
$ ccg_roots -h
TODO
- Tests
- Ability to directly evaluate CCG .auto files
- This would re-implement the functionality of the
generateprogram from C&C (or the more directly-integrated version of this process in Java C&C) so that new parsers wouldn't need to go back to C&C to do the evaluations
- This would re-implement the functionality of the
- Examples for basic usage (for
CategoryTreeandTermGraph)- For now, take a look at
dependencies.pyfor examples of how to useCategoryTree. If examples are even slightly of interest to you, please submit a GitHub issue asking for them as doing so will help motivate me to add them!
- For now, take a look at
- Other tools that might be useful?
- Evaluation scripts (e.g., for evaluating statistical parsers)
- Visualization tools (CCG dependency graphs, LCG term graphs; outputs to SVG, LaTeX...)
License
Unless otherwise stated, all files in this package are subject to the below copyright and license. The main exception is candc_ignore.py, which is derived from the original C&C package and thus covered by the C&C System Licence Agreement. The code therein is reproduced with permission for inclusion in this package.
Copyright 2023 Aditya Bhargava
Licensed under the Apache License, Version 2.0 (the "License"); you may not use the files in this repository except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 or in the LICENSE file in this repository.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catgram-0.3.0.tar.gz.
File metadata
- Download URL: catgram-0.3.0.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.6.10-1-MANJARO
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df9569870b25a5fe96eb2ed75f74cd6d382f3ca911c4cc22fda08f3fd68350b
|
|
| MD5 |
e24c6dda02cefffc8b6f34665696bec8
|
|
| BLAKE2b-256 |
875f55353e1e3ff1191500e0ce433042e8959b61d46dac74f3f70b30d5a19907
|
File details
Details for the file catgram-0.3.0-py3-none-any.whl.
File metadata
- Download URL: catgram-0.3.0-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.6.10-1-MANJARO
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b33b2624f4e03a7ebf457a75508bffd55872f88951acc0b97e091f28757e631
|
|
| MD5 |
72bcc6d536695a48b385dcfd32336d0e
|
|
| BLAKE2b-256 |
c347bee2eee7114d877feb7d5f71dcdc9ef05229fa8bb4d71e043bf0fc16641e
|