Skip to main content

A GUI for treebank annotation

Project description

Annotald

Annotald is a program for annotating parsed corpora in the Penn Treebank format. For more information on the format (as instantiated by the Penn Parsed Corpora of Historical English), see the documentation by Beatrice Santorini. Annotald was originally written by Anton Ingason as part of the Icelandic Parsed Historical Corpus project. It is currently being developed by him along with Jana Beck and Aaron Ecay.

Obtaining Annotald

The central location for Annotald development is on Github. You can view or download the program’s source code from there. The latest release is available as a Python package. Install it with the command pip install annotald . (Further information about installation is available in the user’s manual.)

Using Annotald

The Annotald user’s manual can be found online. For developers, there is also automatically generated API documentation.

License

Annotald is available under the terms of the GNU General Public License (GPL) version 3 or (at your option) any later version. Please see the LICENSE file included with the source code for more information.

Funding Sources

Annotald development has been funded by the following funding sources:

  • Icelandic Research Fund (RANNÍS), grant #090662011: “Viable Language Technology beyond English – Icelandic as a Test Case”

  • The research funds of Anthony Kroch at the University of Pennsylvania.

News

Release 1.1.2

A bugfix release. Changes:

  • Fix overapplication of case in context menu. (Thanks to Joel for report)

  • Fix crash when time log db is corrupt. (Thanks to Sandra for report)

  • Fixes in formatting of documentation. (Thanks to Beatrice for report)

  • Various code cleanups.

Release 1.1.1

A hotfix release. Changes:

  • Fix the height of the context menu (thanks to Jana for reporting)

  • Fix the interaction of the context menu and case tags. Case is now factored out of context menu calculations, just like numerical indices (thanks to Joel for reporting)

  • Fix calculation of the set of alternatives for the context menu (thanks to Joel for reporting)

The user’s manual also acquired an improved section on installation and remote access.

Release 1.1

Changes:

  • Annotald is now tested on Python 2.6+ and 3.3+. Annotald officially supports (only) these versions of Python

  • Annotald is now distributed through PyPI, the official python package archive

  • Many bugs fixed

Release 1.0

This is the first release since 12.03. The version numbering scheme has changed.

Significant changes in this version:

  • A user’s manual was written

  • Significant under-the-hood changes to allow the editing of large files in Annotald without overly taxing the system CPU or RAM

  • A structural search feature was added

  • The case-related functions in the context menu were made portable

  • A comprehensive time-logging facility was added

  • The facility to display only a certain number of trees, instead of a whole file at once, was added

  • A metadata editor for working with the deep format was added (the remaining support for this format remains unimplemented)

  • A python settings file was added, in addition to the javascript settings file

  • The facility to add custom CSS rules via a file was added

  • Significant changes of interest to developers: - A developer’s manual was written - Test suites for javascript and python code were added

Release 12.03

This is the first release since 11.12.

Potentially backwards-incompatible changes:

  • The handling of dash tags has been overhauled. Annotald now has three separate lists of allowable dash tags: one list for dash tags on word-level labels, one for dash tags on clausal nodes (IP and CP), and one for dash tags on non-clausal non-leaf nodes. Refer to the settings.js file distributed with Annotald to see how to configure these options.

  • Annotald is now licensed under the GPL, version 3 or higher.

Other changes:

  • Added support for validation queries. Use the command-line option -v <path> to the annotald script to specify a validation script. Click the “Validate” button in the annotald interface to invoke the script. The script should read trees on standard input, and write (possibly modified) trees to standard output. The output of the script will replace the content of the annotald page. By convention, the script should add the dash tag -FLAG to nodes that are considered errors. The “next error” button will scroll the document to the next occurrence of FLAG. The fixError function is available for user keybindings, and removes the -FLAG from the selected node. The -FLAG tag is automatically removed by Annotald on save. NOTE: the specifics of this interface are expected to change in future versions.

  • Added a comment editor. Press ‘l’ with a comment selected to pop up a text box to edit the text of the comment. Spaces in the original text are converted to underscores in the tree representation. A comment is defined as a CODE node whose text is enclosed in curly braces {}, and the first part of the text inside the braces is one of “COM:”, “TODO:”, or “MAN:”. The three types of comment can be toggled between, using the buttons at the bottom left of the dialog box.

  • Added time-logging support. Annotald will write a “timelog.txt” file in the working directory, with information about when the program is started/stopped/the file is saved. Jana Beck’s (as yet unreleased) CorpusReader tool can be used to calculate parsing time and words-per-hour statistics.

  • Added a facility to edit CorpusSearch .out files. These files have extraneous comments added by CS. Give the -o command-line flag to the annotald program, and the comments will be removed so that Annotald can successfully parse the trees.

  • Annotald successfully runs on systems which have Python 3 as the “python” command. This relies on the existence of Python 2.x as the “python2” command.

  • Added support for clitic traces. When creating a movement trace with the leafBefore and leafAfter functions, if the original phrase has the dash tag -CL, the trace inserted will be *CL*.

  • Annotald now colors IP-level nodes and the topmost “document” node differently.

  • Bug fixes.

Release 11.12

Changes:

  • Various bugs fixed

  • Support for ID and METADATA nodes, as sisters of the clause root. (Currently, nodes other than ID and METADATA will not work.)

  • Change how the coloring is applied to clause roots. Call styleIpNodes() in settings.js after setting the ipnodes variable.

  • Add mechanism to hide certain tags from view; see settings.js for details.

  • Added mousewheel support; use shift+wheel-up/-down to move through the tree, sisterwise

  • Limit undo history to 15 steps. This limits how much memory is used by Annotald, which could be very high.

  • Allow (optional) specification of port on the commandline: annotald -p <number> <optional settings file> <.psd file> This allows multiple instances of Annotald ot be running at once (each on a different port)

Release 11.11

Changes:

  • Proper Unicode support on OS X and Linux

  • Remove dependency on a particular charset in parsed files

  • Code cleanup (see hacking.txt for instructions/style guide)

  • Add support for lemmata in (POS word-lemma) format

  • Speed up the moving of nodes in some cases

  • Add a notification message when save completes successfully

  • Add an “exit” button, which kills the Annotald server and closes the browser window. Exit will fail if there are unsaved changes

  • Change behavior of mouse click selection. Previously, the following behavior was extant: 1) Click a node 2) Change the node’s label with a keybaord command 3) Click another node to select it Result: the just-clicked node is made the selection endpoint This can be surprising. Now, in order to make a secondary selection, the two mouseclicks must immediately follow each other, without any intervening keystrokes.

  • Allow context-sensitive label switching commands. See the included settings.js file for an example

  • (Experimental) Add a CSS class to each node in the tree corresponding to its syntactic label. This facilitates the specification of additional CSS rules (for an example, see the settings file)

  • Keybindings can now be specified with control and shift modifier keys (though not both together). The second argument (action to be taken) for a binding can now be an arbitrary javascript function; the third argument is the argument (singular for now) to be passed to the function.

IcePaHC version

Initial version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annotald-1.1.2.tar.gz (196.2 kB view details)

Uploaded Source

Built Distribution

annotald-1.1.2-py2.py3-none-any.whl (193.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file annotald-1.1.2.tar.gz.

File metadata

  • Download URL: annotald-1.1.2.tar.gz
  • Upload date:
  • Size: 196.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for annotald-1.1.2.tar.gz
Algorithm Hash digest
SHA256 016a142f03b334d617256a01bd23df035c508d8198d56d9607729231f11cb5a3
MD5 059fc1e32bda502a714c13544cd37ab4
BLAKE2b-256 acaab253b5679c7529a66a1b9d1a5eb51ad9f30275ca8720f9cd0d79b4df6ac9

See more details on using hashes here.

File details

Details for the file annotald-1.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for annotald-1.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d10554fcf3fe8f96d69a51c47a128a1038040e13693a8308d69a0519e7c62147
MD5 fe6d0ffd688c85da3cc9f794e876b381
BLAKE2b-256 621ad27c989bbdb9c631e3ba260abd828072c9f1e6c1abfee4ec8765ab2d08de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page