Skip to main content

A Python parser for Blacklab Corpus Query Language

Project description

A Python parser for BlackLab Corpus Query Language

Documentation PyPI - Python Version codecov Interrogate coverage License

A full-coverage Python parser for the BlackLab Corpus Query Language (BCQL) that converts query strings into a Pydantic v2 AST (Abstract Syntax Tree) with lossless round-trip reconstruction and structured error reporting.

To get started, you can check out:

Features

  • Complete BCQL coverage: token queries, sequences, repetitions, spans, lookarounds, captures, global constraints, relations, alignments, and built-in functions.
  • Immutable Pydantic v2 AST: every node is a frozen BaseModel subclass with a node_type discriminator, making inspection and pattern matching straightforward.
  • Lossless BCQL round-trip: to_bcql() reproduces the original query (preserving shorthand forms, quote characters, sensitivity flags, etc.).
  • Position-aware syntax errors: BCQLSyntaxError carries the original query, the 0-based offset, and a caret-annotated message: ready to forward to a user or LLM.
  • Optional semantic validation: a CorpusSpec describes which annotations, span tags, alignment fields, and dependency relations your corpus supports. Pass it as parse(query, spec=spec) to catch typos and unsupported features before they reach the corpus. See the tagset validation guide.
  • Zero runtime dependencies beyond Pydantic.

Installation

pip install bcql_py

Or with uv:

uv add bcql_py

Try the demo

A small Gradio app under app/ lets you paste a BCQL query, pick or build a CorpusSpec, and inspect parse + validation results. The hosted demo runs on Hugging Face Spaces at BramVanroy/bcql_py_validation.

To run it locally:

uv sync --group app
uv run python app/app.py

Development

Clone and set up the project:

git clone https://github.com/BramVanroy/bcql_py.git
cd bcql_py
uv sync --dev

Enable pre-commit hooks:

uv run pre-commit install

After installation, hooks run automatically on every git commit. We do style chechking with ruff and type-checking with mypy. You can also run them manually across the whole repo:

uv run pre-commit run --all-files

To work on documentation locally:

make docs

You can/should run tests before pushing to the remote, although a Github workflow will run those anyway on push. To run them locally:

make test

ANTLR to generate the needed tools

BlackLab uses ANTLR to generate the parser/lexer in Java based on a g4 file. We could similarly generate Python files. However, after trying it out, I find the files obfuscated and unclear and I'm not fond of requiring an extra external (Java-based) library. That is not a slight to ANTLR; I am simply not familiar with the tool: I am sure it is incredibly powerful and useful if you know how to use it. To keep a clearer view of this library I therefore strive to make a Python-native implementation that is true to spec. It's also just a fun project that I do not wish to "automate away" (though I might regret that later). At a later time (TODO) I might implement functionality to cross-validate our implementation with the generated ANTLR parser and lexer. For now I will be satisfied with high coverage testing. In case of doubt I have followed the Bcql.g4 file.

If you'd like to try the ANTLR route yourself, you can try it as follows:

  1. Install requirements (not included in our pyproject.toml file, you'll need to download these yourself!)

    uv pip install requests antlr4-tools antlr4-python3-runtime
    
  2. Download the BlackLab G4 definition from GitHub. You can optionally specify a --branch or --tag, defaults to --branch dev.

    uv run python scripts/get_bcql_g4.py
    # Saved to parser/Bcql.g4
    cd parser/
    
  3. Run ANTLR (you can update -v to the latest version if needed)

    antlr4 -v 4.13.2 -Dlanguage=Python3 Bcql.g4
    

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcql_py-0.3.0.tar.gz (244.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bcql_py-0.3.0-py3-none-any.whl (63.1 kB view details)

Uploaded Python 3

File details

Details for the file bcql_py-0.3.0.tar.gz.

File metadata

  • Download URL: bcql_py-0.3.0.tar.gz
  • Upload date:
  • Size: 244.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bcql_py-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b92a8c277339dd9c0cebd24f717533d54ae48c0e41ecb77e075f549881b7c142
MD5 f55c35072bac773c447adc7ec8300684
BLAKE2b-256 d4ff87e8a5c3c57124cf9ea0bf96642cc7ee1c5f9845e002ed8540e5d96e3aaf

See more details on using hashes here.

File details

Details for the file bcql_py-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: bcql_py-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 63.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bcql_py-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52c8b3e316ac34cb0b1c41f72a32b3ae38ef7acf6697f21b47ce4d7b1a56e32f
MD5 2aa83c7e44d8c84ff260f878a18015fb
BLAKE2b-256 00d0387f480b7eac08f6f5daffcbf88ae0295c276602d8a2d638a57e0e2b238c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page