Skip to main content

High-performance toolkit for querying linguistic dependency parses

Project description

Treesearch

PyPI

Pattern matching for dependency treebanks.

⚠️ Early Stage: This project is under active development. The API and query language will change as we refine the design.

Overview

Treesearch finds syntactic patterns in dependency-parsed corpora. It reads treebanks in CoNLL-U format and returns all sentences matching a specified structural pattern. Designed for corpus linguistics research on large treebanks with automatic parallel processing for multi-file operations.

Installation

From PyPI

Requires Python 3.12+.

pip install treesearch-ud

# Optional: Install with visualization support (displaCy)
pip install treesearch-ud[viz]

From Source

Requires Python 3.12+ and Rust toolchain.

# Clone repository
git clone https://github.com/rmalouf/treesearch
cd treesearch

# Install with uv (recommended)
uv pip install -e .

# Or with pip
pip install maturin
maturin develop

Quick Example

Find passive constructions in an English treebank:

import treesearch

# Parse a pattern for passive voice
pattern = treesearch.compile_query("""
    MATCH {
        V [upos="VERB"];
        Aux [lemma="be"];
        V -[aux:pass]-> Aux;
    }
""")

# Search a single file
for tree, match in treesearch.search("corpus.conllu", pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

Search multiple files with automatic parallel processing:

# Glob pattern for multiple files
for tree, match in treesearch.search("data/*.conllu", pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

# Or use the object-oriented API
treebank = treesearch.load("data/*.conllu")
for tree, match in treebank.search(pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

Pattern Language

Patterns specify structural constraints on dependency trees:

MATCH {
    Verb [upos="VERB" & lemma="help"];
    Obj [upos="NOUN"];
    Verb -[obj]-> Obj;
}

Node constraints: upos, xpos, lemma, form, deprel, feats.* (morphological features), misc.* (miscellaneous features)

Edge constraints: -> (child), -[label]-> (labeled edge), !-> (negative), !-[label]-> (negative labeled edge)

Precedence: < (immediately precedes), << (precedes)

EXCEPT blocks: Reject matches where a condition is true (negative existential)

OPTIONAL blocks: Extend matches with additional bindings if possible

Data Format

Reads treebanks in CoNLL-U format. Supports plain text (.conllu) and gzip-compressed files (.conllu.gz) with automatic decompression.

Documentation

License

MIT

Citation

If you use Treesearch in your research, please cite:

@software{treesearch,
  author = {Malouf, Robert},
  title = {Treesearch: Pattern matching for dependency treebanks},
  year = {2025},
  url = {https://github.com/rmalouf/treesearch}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treesearch_ud-0.2.0.tar.gz (213.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

treesearch_ud-0.2.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.7 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ ARM64

treesearch_ud-0.2.0-cp314-abi3-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14+Windows x86-64

treesearch_ud-0.2.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARM64

treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

treesearch_ud-0.2.0-cp312-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

treesearch_ud-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file treesearch_ud-0.2.0.tar.gz.

File metadata

  • Download URL: treesearch_ud-0.2.0.tar.gz
  • Upload date:
  • Size: 213.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for treesearch_ud-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a2911b4599982a75c85f8b19beda06051a1bf0753109b74af4bf7b70494713b8
MD5 20516fb909788772ce61fb8339035736
BLAKE2b-256 759e6100dc39fda1d07fe580b7b5f46b177083e01e02dc2bed2326a555af5dc9

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 48110e58586b4633b4773b861678ec7b2974746175fa703a0b8a7836d6d6dbf0
MD5 5e674316bf7b5a705aed751b65fcf7f1
BLAKE2b-256 fc2b3b9eaf7fff403d9521fdd8c48aefa428324c9201e36440e1bbe329289212

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp314-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp314-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2ca06a15e1337ccd6d43a48502b0b7f141d730643bb20cded22204da491fde01
MD5 546611ed293dd7db35cc721603ecb94c
BLAKE2b-256 c9e4ae855eaed957b44c6f1cac14adf06b8692c22c45e2e5e6034dc63e28ec57

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 25be36dc97e99e1cac30387dc9bd5ae947d19eded3ecf3eeefafc1efd2e1765c
MD5 5309dfb4d665408f538db10e711ea10d
BLAKE2b-256 64830a38758c2bd912511cf67d0e091d86897e570a4bbbcb44011bd1f5257f2f

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9bdd180c9878c8fccdd862534fdffae85a5de04227132f15b0a7a4f5fe1a2c0b
MD5 929fc2c8b72695367309a676c371ada9
BLAKE2b-256 33e22e73e45048dc1115fbb0f617e4ce2a2b7cf30adde8c3683ce561648827ab

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 081b232770ac7051df06bea021a8088230c2c9a8b3bdbfa326fb1c3c64d7ef3d
MD5 9ca7b173c006fb84b44573a9356cbe53
BLAKE2b-256 716110aefac8ec2281d4f1b54c27573360694222526e8604461dfa91aff90959

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d8eac360cd9d9a8e503ed4276caf4f3aedc8c6be4d13aaa34fbd713b8ce8b88b
MD5 27b8107b271b533a8940c6d791c9d133
BLAKE2b-256 92710c34fc64738f5434487222fada10eec90a378886324e7c516b23c5894257

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.2.0-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0775075792b367f65bbe703b0702908b2a8d0c229752d2c108b25a7f02a22424
MD5 bd0326e43ebd4f762a54efba3878ba26
BLAKE2b-256 1c50591f33124af04717b573e360145c7a9395800dec5d97d0e58c2c6f2d28cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page