High-performance toolkit for querying linguistic dependency parses
Project description
Treesearch
Pattern matching for dependency treebanks.
⚠️ Early Stage: This project is under active development. The API and query language will change as we refine the design.
Overview
Treesearch finds syntactic patterns in dependency-parsed corpora. It reads treebanks in CoNLL-U format and returns all sentences matching a specified structural pattern. Designed for corpus linguistics research on large treebanks with automatic parallel processing for multi-file operations.
Installation
From PyPI
Requires Python 3.12+.
pip install treesearch-ud
From Source
Requires Python 3.12+ and Rust toolchain.
# Clone repository
git clone https://github.com/rmalouf/treesearch
cd treesearch
# Install with uv (recommended)
uv pip install -e .
# Or with pip
pip install maturin
maturin develop
Quick Example
Find passive constructions in an English treebank:
import treesearch
# Parse a pattern for passive voice
pattern = treesearch.compile_query("""
MATCH {
V [upos="VERB"];
Aux [lemma="be"];
V -[aux:pass]-> Aux;
}
""")
# Search a single file
for tree, match in treesearch.search("corpus.conllu", pattern):
verb = tree.word(match["V"])
print(f"{verb.form}: {tree.sentence_text}")
Search multiple files with automatic parallel processing:
# Glob pattern for multiple files
for tree, match in treesearch.search("data/*.conllu", pattern):
verb = tree.word(match["V"])
print(f"{verb.form}: {tree.sentence_text}")
# Or use the object-oriented API
treebank = treesearch.load("data/*.conllu")
for tree, match in treebank.search(pattern):
verb = tree.word(match["V"])
print(f"{verb.form}: {tree.sentence_text}")
Pattern Language
Patterns specify structural constraints on dependency trees:
MATCH {
Verb [upos="VERB" & lemma="help"];
Obj [upos="NOUN"];
Verb -[obj]-> Obj;
}
Node constraints: upos, xpos, lemma, form, deprel, feats.* (morphological features), misc.* (miscellaneous features)
Edge constraints: -> (child), -[label]-> (labeled edge), !-> (negative), !-[label]-> (negative labeled edge)
Precedence: < (immediately precedes), << (precedes)
EXCEPT blocks: Reject matches where a condition is true (negative existential)
OPTIONAL blocks: Extend matches with additional bindings if possible
Data Format
Reads treebanks in CoNLL-U format. Supports plain text (.conllu) and gzip-compressed files (.conllu.gz) with automatic decompression.
Documentation
- API.md - Complete Python API reference
- GitHub repository - Source code and issue tracker
License
MIT
Citation
If you use Treesearch in your research, please cite:
@software{treesearch,
author = {Malouf, Robert},
title = {Treesearch: Pattern matching for dependency treebanks},
year = {2025},
url = {https://github.com/rmalouf/treesearch}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file treesearch_ud-0.1.0.tar.gz.
File metadata
- Download URL: treesearch_ud-0.1.0.tar.gz
- Upload date:
- Size: 173.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a66c6cbe05f658d38b212458ee28a2dcd9de525f825a7e7dd4e2d03ba41118c3
|
|
| MD5 |
02973623e802fc1697b2bc02518ace1c
|
|
| BLAKE2b-256 |
ca879e1fbd01e18448d43aa3cb94e895eb59a2b857133064baff59394c3c38a9
|
File details
Details for the file treesearch_ud-0.1.0-cp312-abi3-win_amd64.whl.
File metadata
- Download URL: treesearch_ud-0.1.0-cp312-abi3-win_amd64.whl
- Upload date:
- Size: 440.5 kB
- Tags: CPython 3.12+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c92e13b2c1d4d55fe5f379c8f023aa6275f10407cb77abc18a6af1195cf4b4eb
|
|
| MD5 |
9161b0d9ff1e6b7b1dca1eedf2c3505a
|
|
| BLAKE2b-256 |
cc75a6ff3317f652c1730d0f76a48e7f36d5764053a5fe7007ce171ed6c63114
|
File details
Details for the file treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
037ab6bee9e92c13dca5506dc9bd3ea2c80c4f141ea5e39e51167b78115d59d7
|
|
| MD5 |
4e0840faff89d46dd76c0fba70e3d92c
|
|
| BLAKE2b-256 |
4392993921eabfad5f8770a0f824b4b14d6531eeddd1a861eb91e438842d89ab
|
File details
Details for the file treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: treesearch_ud-0.1.0-cp312-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.12+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ec3b88551a6fc3168ba7f66f692d23685f08347d0405234f997af98f54406c
|
|
| MD5 |
265d7160f8cead064c35746b91ed17be
|
|
| BLAKE2b-256 |
e3beec560a7dd912c426b3576d6a68e4d60e7bb4aaba99928f57844d629fb7ff
|
File details
Details for the file treesearch_ud-0.1.0-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: treesearch_ud-0.1.0-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 504.4 kB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
817f6aa31c53c39731f4c1dfb1b656564ccb3832c84aadaac23ac9ba26c76be8
|
|
| MD5 |
772a7cf313b8216230a5cd697f58ede1
|
|
| BLAKE2b-256 |
3e58da2ad8ae58cbfd25f47dd44153dd3776ca39a5ad52265fcffb5a604c8504
|
File details
Details for the file treesearch_ud-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: treesearch_ud-0.1.0-cp312-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 543.5 kB
- Tags: CPython 3.12+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34c92beb44bcf89eaf6c15e12ed72df0ac9ad29622706039c95c7317770a82b7
|
|
| MD5 |
1a52fa9754d7939e331640eb670686a6
|
|
| BLAKE2b-256 |
2903f9eafb791ff10fe2c226971598aab82688ee5e8d085a4e3d61081c8f5f35
|