Skip to main content

Tools for processing treebank trees

Project description

PyPI version Github All Releases

treetools - tree processing

treetools is a collection of tools for processing treebank trees. It contains algorithms for tree manipulation (such as removal of crossing branches), tree analysis, and grammar extraction.

treetools has been developed at the Department for Computational Linguistics at the Institute for Language and Information at the University of Düsseldorf, Germany (see http://phil.hhu.de/beyond-cfg). The project is sponsored by Deutsche Forschungsgemeinschaft (DFG). It is maintained by Wolfgang Maier.

Author: Wolfgang Maier mailto:maierw@hhu.de. Contributions: Kilian Gebhardt

Installation

Requirements:

  • Python 3.11+

From PyPI

To install the latest release from the Python package index, type::

pip install treetools

Development Installation

To set up a development environment, first install uv. On macOS with Homebrew::

brew install uv

Or using pip::

pip install uv

Then clone the repository and sync dependencies::

git clone https://github.com/wmaier/treetools.git
cd treetools
uv sync

Running

Syntax

To run treetools, type::

treetools-cli [subcommand] [parameters] [options]

Available subcommands are:

  • transform: Process treebank trees. Run transformations and convert between different formats.
  • grammar: Extract grammars for different parsers from treebanks.
  • treeanalysis: Analyze certain properties of treebank trees, such as, e.g., gap degree.
  • transitions: Extract transition sequences as used by transition-based parsers.

To get see the available parameters for a subcommand, type::

treetools-cli [subcommand] --help

To get verbose help on available transformation algorithms, available options, etc., type::

treetools-cli [subcommand] --usage

Examples

To attach the punctuation in TIGER and remove its crossing branches while converting it from TigerXML to the export format, type::

treetools-cli transform tiger.xml tiger.continuous.export --trans root_attach negra_mark_heads boyd_split raising --src-format tigerxml --dest-format export

To extract the bare sentences (one per line) from a treebank in bracketed format, such as the Penn Treebank, type::

treetools-cli transform treebank.brackets treebank.terminals --src-format brackets --dest-format terminals

To delete the traces and co-indexation from the Penn Treebank, type::

treetools-cli transform ptb ptb.notrace --transform ptb_transform --src-format brackets --dest-format brackets

To extract an left-to-right binarized LCFRS with v1/h2 markovization in rparse format from an export-format treebank, type::

treetools-cli grammar input_treebank output_grammar leftright --dest-format rcg --markov v:1 h:2

Development

Running Tests

To run tests with the development environment, type::

uv run pytest

Installing New Packages

To add a new package to your development environment, type::

uv add <package-name>

For development-only dependencies (like testing tools), use::

uv add --dev <package-name>

This will update both pyproject.toml and uv.lock automatically.

License

The code is released under the GNU General Public Licence (GPL) 3.0 or higher. The license texts can be found at at http://www.gnu.org/licenses/gpl-3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treetools-1.0.2.tar.gz (60.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

treetools-1.0.2-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file treetools-1.0.2.tar.gz.

File metadata

  • Download URL: treetools-1.0.2.tar.gz
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for treetools-1.0.2.tar.gz
Algorithm Hash digest
SHA256 cdec3a88f442790a8beb87eda5474311c6ad2056f2bab08febec42e7fb4325d3
MD5 25072eb479f10859296c69c8d9524768
BLAKE2b-256 9030389f6dc05ba3eb5ab9ea66e71c47341e4bbba54bc11294a3a2785bf73a1a

See more details on using hashes here.

File details

Details for the file treetools-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: treetools-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for treetools-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8dade94f092d37b5ae3d791493955e578c8f87c4e8bdab0d4cc82e3b5d5330b1
MD5 213f80f41a31ef2c03f693d994b5128b
BLAKE2b-256 ef41d47ac7bf00d80c6d9c602e3f18dce758fa38f6d878f06898897b2b315f49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page