A Python parser for Blacklab Corpus Query Language
Project description
A Python parser for BlackLab Corpus Query Language
A full-coverage Python parser for the BlackLab Corpus Query Language (BCQL) that converts query strings into a Pydantic v2 AST (Abstract Syntax Tree) with lossless round-trip reconstruction and structured error reporting.
To get started, you can check out:
- A Quickstart guide
bcql_pyand BCQL general guides- The full API reference
- Python code examples
- A Gradio demo
Features
- Complete BCQL coverage: token queries, sequences, repetitions, spans, lookarounds, captures, global constraints, relations, alignments, and built-in functions.
- Immutable Pydantic v2 AST: every node is a frozen
BaseModelsubclass with anode_typediscriminator, making inspection and pattern matching straightforward. - Lossless BCQL round-trip:
to_bcql()reproduces the original query (preserving shorthand forms, quote characters, sensitivity flags, etc.). - Position-aware syntax errors:
BCQLSyntaxErrorcarries the original query, the 0-based offset, and a caret-annotated message: ready to forward to a user or LLM. - Optional semantic validation: a
CorpusSpecdescribes which annotations, span tags, alignment fields, and dependency relations your corpus supports. Pass it asparse(query, spec=spec)to catch typos and unsupported features before they reach the corpus. See the tagset validation guide. - Zero runtime dependencies beyond Pydantic.
Installation
pip install bcql_py
Or with uv:
uv add bcql_py
Try the demo
A small Gradio app under app/
lets you paste a BCQL query, pick or build a CorpusSpec, and inspect parse +
validation results. The hosted demo runs on Hugging Face Spaces at
BramVanroy/bcql_py_validation.
To run it locally:
uv sync --group app
uv run python app/app.py
Development
Clone and set up the project:
git clone https://github.com/BramVanroy/bcql_py.git
cd bcql_py
uv sync --dev
Enable pre-commit hooks:
uv run pre-commit install
After installation, hooks run automatically on every git commit.
We do style chechking with ruff and type-checking with mypy.
You can also run them manually across the whole repo:
uv run pre-commit run --all-files
To work on documentation locally:
make serve-docs
This rebuilds a fresh local mike preview before serving it, so you do not end
up testing stale versioned docs. By default it serves a local 0.3.0 version
and latest alias from a temporary docs branch. You can override those values
when needed, for example:
DOCS_VERSION=0.4.0 DOCS_SOURCE_REF=v0.4.0 make serve-docs
Open both /latest/ and /<version>/ while testing. If you are checking the
GitHub source links as well, set DOCS_SOURCE_REF to the release tag you want
to emulate.
You can/should run tests before pushing to the remote, although a Github workflow will run those anyway on push. To run them locally:
make test
ANTLR to generate the needed tools
BlackLab uses ANTLR to generate the parser/lexer in Java based on a g4 file. We could similarly generate Python files. However, after trying it out, I find the files obfuscated and unclear and I'm not fond of requiring an extra external (Java-based) library. That is not a slight to ANTLR; I am simply not familiar with the tool: I am sure it is incredibly powerful and useful if you know how to use it. To keep a clearer view of this library I therefore strive to make a Python-native implementation that is true to spec. It's also just a fun project that I do not wish to "automate away" (though I might regret that later). At a later time (TODO) I might implement functionality to cross-validate our implementation with the generated ANTLR parser and lexer. For now I will be satisfied with high coverage testing. In case of doubt I have followed the Bcql.g4 file.
If you'd like to try the ANTLR route yourself, you can try it as follows:
-
Install requirements (not included in our pyproject.toml file, you'll need to download these yourself!)
uv pip install requests antlr4-tools antlr4-python3-runtime
-
Download the BlackLab G4 definition from GitHub. You can optionally specify a
--branchor--tag, defaults to--branch dev.uv run python scripts/get_bcql_g4.py # Saved to parser/Bcql.g4 cd parser/
-
Run ANTLR (you can update
-vto the latest version if needed)antlr4 -v 4.13.2 -Dlanguage=Python3 Bcql.g4
Acknowledgments
- BlackLab
- Robert Nystrom's guide on "Crafting Interpreters",
specifically the part on "Scanning". Token types and error handling in
bcql_pyis heavily inspired by his work. - Jamis Buck's blog post on recursive descent parsers
- Berkeley course notes on BNF
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bcql_py-0.3.2.tar.gz.
File metadata
- Download URL: bcql_py-0.3.2.tar.gz
- Upload date:
- Size: 252.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ecc718398c3bd8de123dc3d43720476777ae57d113809568bf2591728d7d727
|
|
| MD5 |
ac18de80ee6d150797d92eb5ea863301
|
|
| BLAKE2b-256 |
bcc48ed00e7c46ec2e3638cafaf8844c4f020eb9d66c2d1141c40e5a8c73aeaa
|
File details
Details for the file bcql_py-0.3.2-py3-none-any.whl.
File metadata
- Download URL: bcql_py-0.3.2-py3-none-any.whl
- Upload date:
- Size: 64.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15f36d93dc677de44ae5f42af0bbf32e9824e2d536b6dbf442454a540b0e1c8f
|
|
| MD5 |
d640b23f1049fa9e9f074355491310a6
|
|
| BLAKE2b-256 |
9de615019fb539f509c1359e92a95ba4f625481cd01a56e897db3c3fc40c954e
|