Skip to main content

High-performance toolkit for querying linguistic dependency parses

Project description

Treesearch

PyPI

Pattern matching for dependency treebanks.

⚠️ Early Stage: This project is under active development. The API and query language will change as we refine the design.

Overview

Treesearch finds syntactic patterns in dependency-parsed corpora. It reads treebanks in CoNLL-U format and returns all sentences matching a specified structural pattern. Designed for corpus linguistics research on large treebanks with automatic parallel processing for multi-file operations.

Installation

From PyPI

Requires Python 3.12+.

pip install treesearch-ud

From Source

Requires Python 3.12+ and Rust toolchain.

# Clone repository
git clone https://github.com/rmalouf/treesearch
cd treesearch

# Install with uv (recommended)
uv pip install -e .

# Or with pip
pip install maturin
maturin develop

Quick Example

Find passive constructions in an English treebank:

import treesearch

# Parse a pattern for passive voice
pattern = treesearch.compile_query("""
    MATCH {
        V [upos="VERB"];
        Aux [lemma="be"];
        V -[aux:pass]-> Aux;
    }
""")

# Search a single file
for tree, match in treesearch.search("corpus.conllu", pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

Search multiple files with automatic parallel processing:

# Glob pattern for multiple files
for tree, match in treesearch.search("data/*.conllu", pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

# Or use the object-oriented API
treebank = treesearch.load("data/*.conllu")
for tree, match in treebank.search(pattern):
    verb = tree.word(match["V"])
    print(f"{verb.form}: {tree.sentence_text}")

Pattern Language

Patterns specify structural constraints on dependency trees:

MATCH {
    Verb [upos="VERB" & lemma="help"];
    Obj [upos="NOUN"];
    Verb -[obj]-> Obj;
}

Node constraints: upos, xpos, lemma, form, deprel, feats.* (morphological features), misc.* (miscellaneous features)

Edge constraints: -> (child), -[label]-> (labeled edge), !-> (negative), !-[label]-> (negative labeled edge)

Precedence: < (immediately precedes), << (precedes)

EXCEPT blocks: Reject matches where a condition is true (negative existential)

OPTIONAL blocks: Extend matches with additional bindings if possible

Data Format

Reads treebanks in CoNLL-U format. Supports plain text (.conllu) and gzip-compressed files (.conllu.gz) with automatic decompression.

Documentation

License

MIT

Citation

If you use Treesearch in your research, please cite:

@software{treesearch,
  author = {Malouf, Robert},
  title = {Treesearch: Pattern matching for dependency treebanks},
  year = {2025},
  url = {https://github.com/rmalouf/treesearch}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treesearch_ud-0.1.0.tar.gz (173.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

treesearch_ud-0.1.0-cp312-abi3-win_amd64.whl (440.5 kB view details)

Uploaded CPython 3.12+Windows x86-64

treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ ARM64

treesearch_ud-0.1.0-cp312-abi3-macosx_11_0_arm64.whl (504.4 kB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

treesearch_ud-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl (543.5 kB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file treesearch_ud-0.1.0.tar.gz.

File metadata

  • Download URL: treesearch_ud-0.1.0.tar.gz
  • Upload date:
  • Size: 173.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for treesearch_ud-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a66c6cbe05f658d38b212458ee28a2dcd9de525f825a7e7dd4e2d03ba41118c3
MD5 02973623e802fc1697b2bc02518ace1c
BLAKE2b-256 ca879e1fbd01e18448d43aa3cb94e895eb59a2b857133064baff59394c3c38a9

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.1.0-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.1.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c92e13b2c1d4d55fe5f379c8f023aa6275f10407cb77abc18a6af1195cf4b4eb
MD5 9161b0d9ff1e6b7b1dca1eedf2c3505a
BLAKE2b-256 cc75a6ff3317f652c1730d0f76a48e7f36d5764053a5fe7007ce171ed6c63114

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 037ab6bee9e92c13dca5506dc9bd3ea2c80c4f141ea5e39e51167b78115d59d7
MD5 4e0840faff89d46dd76c0fba70e3d92c
BLAKE2b-256 4392993921eabfad5f8770a0f824b4b14d6531eeddd1a861eb91e438842d89ab

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b8ec3b88551a6fc3168ba7f66f692d23685f08347d0405234f997af98f54406c
MD5 265d7160f8cead064c35746b91ed17be
BLAKE2b-256 e3beec560a7dd912c426b3576d6a68e4d60e7bb4aaba99928f57844d629fb7ff

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.1.0-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 817f6aa31c53c39731f4c1dfb1b656564ccb3832c84aadaac23ac9ba26c76be8
MD5 772a7cf313b8216230a5cd697f58ede1
BLAKE2b-256 3e58da2ad8ae58cbfd25f47dd44153dd3776ca39a5ad52265fcffb5a604c8504

See more details on using hashes here.

File details

Details for the file treesearch_ud-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for treesearch_ud-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 34c92beb44bcf89eaf6c15e12ed72df0ac9ad29622706039c95c7317770a82b7
MD5 1a52fa9754d7939e331640eb670686a6
BLAKE2b-256 2903f9eafb791ff10fe2c226971598aab82688ee5e8d085a4e3d61081c8f5f35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page