Pure python parser for Elasticsearch ES|QL queries

These details have not been verified by PyPI

Project description

pyesql

A standalone Python parser for ES|QL (Elasticsearch Query Language) that produces an AST for testing and validation.

No Java, no ANTLR runtime. Pure Python.

Installation

pip install esql-py

Or with uv:

uv add esql-py

Development setup

Clone the repo and install with dev dependencies:

git clone https://github.com/br0k3ns0und/pyesql
cd pyesql
uv sync --extra dev

Quick start

from pyesql import parse

query = parse("FROM logs-* | WHERE @timestamp > NOW() - 1d | LIMIT 100")

query.source          # FromCommand(indices=[IndexPattern(index='logs-*', ...)])
query.pipes           # [WhereCommand(...), LimitCommand(...)]

Inspecting the AST

from pyesql import parse
from pyesql.walker import find_all
from pyesql.ast import FunctionCall

query = parse("FROM logs | STATS count = COUNT(*), avg_ms = AVG(response_time) BY host")
calls = find_all(query, FunctionCall)
print([c.name for c in calls])   # ['COUNT', 'AVG']

Walking with a Visitor

from pyesql import parse
from pyesql.visitor import Visitor
from pyesql.ast import FieldRef, Comparison

class AuditVisitor(Visitor):
    def __init__(self):
        self.comparisons = []

    def visit_Comparison(self, node):
        self.comparisons.append((str(node.left), node.operator, node.right))
        self.generic_visit(node)

v = AuditVisitor()
v.visit(parse("FROM idx | WHERE status == 200 AND host != \"bad\""))
print(v.comparisons)
# [(FieldRef(name=...), '==', IntegerLiteral(value=200)), ...]

Transforming the AST

from pyesql.visitor import Transformer
from pyesql.ast import LimitCommand, IntegerLiteral

class CapLimit(Transformer):
    """Ensure LIMIT never exceeds 1000."""
    def visit_LimitCommand(self, node):
        if isinstance(node.count, IntegerLiteral) and node.count.value > 1000:
            node.count.value = 1000
        return node

Schema validation

pyesql can validate field references and literal types against a data schema. Schemas are loaded from flat JSON, nested JSON, or an Elasticsearch index mapping.

Loading a schema

from pyesql.schema import Schema

# Flat JSON
schema = Schema.from_dict({
    "process.pid":   "integer",
    "process.name":  "keyword",
    "host.name":     "keyword",
    "host.ip":       "ip",
    "@timestamp":    "date",
    "bytes":         "double",
    "active":        "boolean",
})

# Nested JSON (equivalent)
schema = Schema.from_dict({
    "process": {"pid": "integer", "name": "keyword"},
    "host":    {"name": "keyword", "ip": "ip"},
    "@timestamp": "date",
    "bytes":      "double",
    "active":     "boolean",
})

# Elasticsearch index mapping
schema = Schema.from_elasticsearch_mapping(mapping_dict)
# Accepts: full GET /<index>/_mapping response, the "mappings" block,
# or a raw "properties" block. Multi-index responses are merged.

field_type = schema.get_field_type("process.pid")  # "integer"
field_type = schema.get_field_type("missing")       # None

Validating a query

Pass schema= to parse() to validate inline:

from pyesql import parse
from pyesql.validator import SchemaValidationError

try:
    query = parse(
        'FROM logs | WHERE process.pid == "not-an-int"',
        schema=schema,
    )
except SchemaValidationError as e:
    print(e)
    # Schema validation failed with 1 error(s):
    #   - Type mismatch: field 'process.pid' is 'integer' but compared to a string literal
    for issue in e.issues:
        print(issue.field, issue.message)

Or validate separately using SchemaValidator:

from pyesql import parse
from pyesql.validator import SchemaValidator, SchemaValidationError

query = parse("FROM logs | WHERE ghost_field == 1 AND process.pid == 1")

validator = SchemaValidator(schema)
try:
    issues = validator.validate(query)
except SchemaValidationError as e:
    # All errors are collected before raising — not just the first
    for issue in e.issues:
        print(issue)
    # Unknown field 'ghost_field' (field: 'ghost_field')

Strictness levels

Both on_unknown (unknown field names) and on_type_mismatch (wrong literal type) are independently configurable:

Level	Behaviour
`"error"`	Collect and raise `SchemaValidationError` (default when schema is provided)
`"warn"`	Emit `SchemaValidationWarning` via `warnings`
`"silent"`	Ignore (default when no schema is provided)

import warnings
from pyesql.validator import SchemaValidationWarning

# Downgrade unknown fields to warnings, keep type mismatches as errors
query = parse(
    "FROM logs | WHERE unknown_field == 1",
    schema=schema,
    on_unknown="warn",
)

# Catch warnings programmatically
with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    query = parse("FROM logs | WHERE unknown == 1", schema=schema, on_unknown="warn")
    print(w[0].category)  # <class 'SchemaValidationWarning'>

Computed fields

Fields introduced by EVAL, STATS, INLINESTATS, RENAME, DISSECT, GROK, and COMPLETION are automatically excluded from schema checks — they don't originate from the source index.

For GROK, both %{PATTERN:field} and (?<field>...) named-capture styles are recognised. For COMPLETION, the target field (e.g. COMPLETION result = ...) is tracked.

# "doubled" and "count" are computed — not flagged as unknown
query = parse(
    "FROM logs "
    "| EVAL doubled = bytes * 2 "
    "| STATS count = COUNT(*) BY host.name "
    "| SORT count DESC",
    schema=schema,
)

# Inspect computed fields directly
from pyesql import collect_computed_fields

computed = collect_computed_fields(query)
print(computed)  # frozenset({'doubled', 'count'})

Validating against an Elasticsearch beats schema

import gzip, json
from pyesql.schema import Schema

with gzip.open("beats_schemas/v9.3.1.json.gz") as f:
    mapping = json.load(f)

schema = Schema.from_elasticsearch_mapping(mapping)

query = parse(
    "FROM logs-* | WHERE process.pid == 1 AND host.name == \"web-01\"",
    schema=schema,
)

CLI

# Print parse tree
pyesql parse "FROM logs | WHERE status == 200 | LIMIT 10"

# JSON output
pyesql parse --format json "FROM logs | STATS COUNT(*) BY host"

# From file
pyesql parse -f my_query.esql

# From stdin
echo "FROM logs | LIMIT 5" | pyesql parse -

Supported commands

Category	Commands
Source	`FROM`, `ROW`, `SHOW INFO`, `TS`, `PROMQL`
Filtering	`WHERE`
Projection	`KEEP`, `DROP`, `RENAME`
Computation	`EVAL`
Aggregation	`STATS`, `INLINESTATS`
Sorting/Paging	`SORT`, `LIMIT`
Text parsing	`DISSECT`, `GROK`
Enrichment	`ENRICH`
Joins	`JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL JOIN`, `LOOKUP`
Multi-valued	`MV_EXPAND`
Branching	`FORK`, `FUSE`
ML / AI	`COMPLETION`, `RERANK`, `CHANGE_POINT`, `SAMPLE`
Utilities	`URI_PARTS`, `REGISTERED_DOMAIN`, `METRICS_INFO`, `TS_INFO`
Config	`SET`

Error handling

EsqlSyntaxError is raised for both bad tokens (lexer) and structural problems (parser). It exposes line, col, and text attributes for precise reporting.

Unterminated string literal (lexer-level):

from pyesql import parse
from pyesql.errors import EsqlSyntaxError

try:
    parse('FROM logs | WHERE message == "unclosed')
except EsqlSyntaxError as e:
    print(e)
    # Unterminated string literal at line 1, col 29 near '"unclosed'
    print(e.line, e.col)  # 1 29

Missing expression after command keyword (parser-level):

try:
    parse("FROM logs | WHERE")
except EsqlSyntaxError as e:
    print(e)
    # Expected expression, got 'EOF' ('') at line 1, col 18

Running tests

uv run pytest

Or via the Makefile:

make test        # run tests
make lint        # ruff check
make coverage    # tests with coverage report

Package layout

pyesql/
├── __init__.py    public API: parse(), walk(), find_all(), Visitor, Schema, ...
├── ast.py         all AST node dataclasses
├── lexer.py       tokenizer (hand-written, no dependencies)
├── parser.py      recursive-descent parser
├── visitor.py     Visitor and Transformer base classes
├── walker.py      walk(), find_all(), find_first(), filter_nodes()
├── schema.py      Schema (from_dict, from_elasticsearch_mapping)
├── validator.py   SchemaValidator, ValidationIssue, collect_computed_fields
├── errors.py      EsqlSyntaxError, EsqlParseError, EsqlSchemaError
├── cli.py         `pyesql` command-line tool
tests/
├── test_parser.py
└── test_schema.py

Note on compatibility

Built based on elasticsearch commit: de1d740c6270076b1162cbc72bff7820d7c338e2. ES|QL is rapidly changing ... updates will be sporadic, at best ...

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.4

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

esql_py-0.2.4.tar.gz (38.6 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

esql_py-0.2.4-py3-none-any.whl (34.7 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file esql_py-0.2.4.tar.gz.

File metadata

Download URL: esql_py-0.2.4.tar.gz
Upload date: Mar 14, 2026
Size: 38.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for esql_py-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`46dda11a244be85d65f77378fe7911f4ba63ce9c9a1847741802f30e7e8dcac1`
MD5	`bf6e35fa63e35637ef3d03c9b9abb72e`
BLAKE2b-256	`1942facc6dafd0584d9e9c181a25c5232c256f60d1bf1301e20bcc337b50c2c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for esql_py-0.2.4.tar.gz:

Publisher: publish.yml on brokensound77/pyesql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: esql_py-0.2.4.tar.gz
- Subject digest: 46dda11a244be85d65f77378fe7911f4ba63ce9c9a1847741802f30e7e8dcac1
- Sigstore transparency entry: 1098957717
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: brokensound77/pyesql@0f299041a21f26ca5e09016c1b0f14da1c05794e
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/brokensound77
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f299041a21f26ca5e09016c1b0f14da1c05794e
- Trigger Event: push

File details

Details for the file esql_py-0.2.4-py3-none-any.whl.

File metadata

Download URL: esql_py-0.2.4-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 34.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for esql_py-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11df4532f0cdaec9821127d79da4839de99b6ee003ec1e4daf371c1a6f5f5a5e`
MD5	`a16f4ef23918440136cc969cb868c9e0`
BLAKE2b-256	`f8f6063b847a04e7af5b24ad77cccc56a39105d97c12e136787081a17fb88ef8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for esql_py-0.2.4-py3-none-any.whl:

Publisher: publish.yml on brokensound77/pyesql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: esql_py-0.2.4-py3-none-any.whl
- Subject digest: 11df4532f0cdaec9821127d79da4839de99b6ee003ec1e4daf371c1a6f5f5a5e
- Sigstore transparency entry: 1098957729
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: brokensound77/pyesql@0f299041a21f26ca5e09016c1b0f14da1c05794e
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/brokensound77
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f299041a21f26ca5e09016c1b0f14da1c05794e
- Trigger Event: push

esql-py 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

pyesql

Installation

Development setup

Quick start

Inspecting the AST

Walking with a Visitor

Transforming the AST

Schema validation

Loading a schema

Validating a query

Strictness levels

Computed fields

Validating against an Elasticsearch beats schema

CLI

Supported commands

Error handling

Running tests

Package layout

Note on compatibility

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance