Skip to main content

LexCQL Query Grammar and Parser

Project description

LexCQL for Python

A query parser for LexCQL, the query language for lexical resources in the CLARIN Federated Content Search (FCS).

Installation

Install from PyPI:

python3 -m pip install lexcql-parser

Or install from source:

git clone https://github.com/Querela/lexcql-python.git
cd lexcql-python
uv build

# built package
python3 -m pip install dist/lexcql_parser-<version>-py3-none-any.whl
# or
python3 -m pip install dist/lexcql_parser-<version>.tar.gz

# for local development
python3 -m pip install -e .

Usage

The high-level interface lexcql.parser.QueryParser wraps the ANTLR4 parse tree into a simplified query node tree that is easier to work with. The lexcql-parser exposes a simple parsing function with lexcql.parse(input: str, enableSourceLocations: bool = True) -> lexcql.parser.QueryNode:

import lexcql

## parsing a valid query into a query node tree
# our query input string
input = "Banane Or lemma =/lang=eng apple"
# parse into QueryNode tree
sc = lexcql.parse(input)
# print stringified tree
print(str(sc))

## handling possibly invalid queries
input = "broken query"
try:
    lexcql.parse(input)
except lexcql.QueryParserException as ex:
    print(f"Error: {ex}")

You can also use the more low-level ANTLR4 framework to parse the query string. A handy wrapper is provided with lexcql.antlr_parse(input: str) -> LexParser.QueryContext.

from antlr4 import CommonTokenStream, InputStream
from lexcql.parser import LexLexer, LexParser

input = "example"
input_stream = InputStream(input)
lexer = LexLexer(input_stream)
stream = CommonTokenStream(lexer)
parser = LexParser(stream)
tree: LexParser.QueryContext = parser.query()

Parsed queries can also be checked against their specification conformance.

from lexcql import QueryParser
from lexcql.validation import LexCQLValidatorV0_3, SpecificationValidationError

parser = QueryParser(enableSourceLocations=True)

query = """Banane"""
node = parser.parse(query)
validator = LexCQLValidatorV0_3()
validator.validate(node, query=query)
len(validator.errors) == 0  # no errors

# or to raise an error on first violation
query = """post = NOUN"""
node = parser.parse(query)
validator = LexCQLValidatorV0_3(raise_at_first_violation=True)
validator.validate(node, query=query)  # raises SpecificationValidationError

A convenience method is provded with lexcql.validate(query: str):

from lexcql import validate

# simple boolean returns
validate("lemma = apple")  # => True
validate("lemmas = apple")  # => False ("lemmas" is unknown field name)
validate("lemma =")  # => False (parse error, missing search term)

# or with list of errors
error = validate("post = NOUN", return_errors=True)[0]  # has one error
assert error.message == "Unknown index 'post'!"
# error is the full query
assert error.fragment == "post = NOUN"
assert error.position.start == 0
assert error.position.stop == 11
assert error.type == "validation-error"

Development

Fetch (or update) grammar files:

git clone https://github.com/clarin-eric/fcs-ql.git
cp fcs-ql/src/main/antlr4/eu/clarin/sru/fcs/qlparser/lex/*.g4 src/lexcql/

(Re-)Generate python parser code:

# setup environment
uv sync --extra antlr
# NOTE: you can activate the environment (if you do not want to prefix everything with `uv run`)
# NOTE: `uv` does not play nicely with `pyenv` - if you use `pyenv`, sourcing does NOT work!
source .venv/bin/activate

cd src/lexcql
uv run antlr4 -Dlanguage=Python3 *.g4 -listener -visitor

Run style checks:

# setup environment
uv sync --extra style

uv run isort --check --diff .
uv run black --check .
uv run flake8 . --show-source --statistics

uv run mypy src

Run tests:

# setup environment
uv sync --extra test

uv run pytest
# to see output and run a specific test file
uv run pytest -v -rP tests/validation/test_validation.py

Run check before publishing:

# setup environment
uv sync --extra build

# build the package
uv build
# run metadata check
uv run twine check --strict dist/*
# (manual) check of package contents
tar tvf dist/lexcql_parser-*.tar.gz

See also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexcql_parser-1.3.5.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexcql_parser-1.3.5-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file lexcql_parser-1.3.5.tar.gz.

File metadata

  • Download URL: lexcql_parser-1.3.5.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lexcql_parser-1.3.5.tar.gz
Algorithm Hash digest
SHA256 2e6510e1ef362eba6aeb2934bd2d442f239bc14077044b5da4cf5b35acdfbe04
MD5 13223fa774a8f56d20cc33c12e128582
BLAKE2b-256 88ebd6d72f81546bbbf0be12308cb5371bd742d50bc58c078d01200d2ade031f

See more details on using hashes here.

File details

Details for the file lexcql_parser-1.3.5-py3-none-any.whl.

File metadata

  • Download URL: lexcql_parser-1.3.5-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lexcql_parser-1.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f0575302b7568e99101d1e715ad8ca535240b2ce51553e7d8bbbf490bdb8afbc
MD5 8ed81c26d8713705a6db56deee3f8dcf
BLAKE2b-256 01d229cd7fc0dc3bfd0b39b3b3a66b29454041bb23f752f0967f76fadca27c18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page