Skip to main content

Cypher frontend compiler that lowers Neo4j-like Cypher into SQLGlot AST

Project description

CypherGlot

CypherGlot is the Cypher frontend compiler for the HumemAI stack.

Docs Test Build Docs Generated Frontend Publish PyPI


✨ What CypherGlot is

It takes Neo4j-like Cypher, enforces an explicit admitted subset, normalizes the accepted shape, and lowers it into SQLGlot-backed output that another runtime can plan and execute.

The compatibility target is Neo4j-valid first: admitted queries should ideally be valid on Neo4j unchanged, while other Cypher runtimes such as ArcadeDB and Ladybug may still require small compatibility rewrites around that same subset.

raw Cypher string
→ parse
→ validate admitted subset
→ normalize
→ graph-relational IR
→ backend-aware lowering
→ SQLGlot AST or SQL-backed program

CypherGlot is intentionally compiler-only.

  • It parses and lowers Cypher.
  • It does not execute SQL.
  • It does not own storage.
  • It does not execute vector search.

🎯 What it is for

  • a reusable Cypher frontend compiler
  • a stable boundary between Cypher parsing and host-runtime execution
  • SQLGlot-backed output for embedded runtimes such as HumemDB

Current backend direction

CypherGlot targets equal multi-dialect SQL support through a backend-neutral IR plus backend-aware lowering.

  • the intended compiler path is now Cypher AST -> normalize -> graph-relational IR -> backend-aware lowering -> SQLGlot AST/program -> SQL
  • SQLite has an executable lowering path through the shared IR
  • DuckDB now has an explicit lowering path from the same shared architecture; support claims remain strict
  • PostgreSQL is part of the same IR-based backend path
  • dialect=... rendering support remains useful for string output experiments and host integration work, but rendering alone is still not a portability guarantee
  • a backend counts as supported only when admitted Cypher shapes execute correctly against that backend's schema and runtime contract

HumemDB is the main reference host runtime for the current SQLite-backed execution contract.

Graph-to-table schema contract

CypherGlot’s output is schema-aware. If you want to execute its compiled SQL, your runtime needs to provide the graph-to-table layout that the compiler expects.

CypherGlot uses a generated type-aware schema contract.

The target contract is:

  • one table per node type
  • one table per edge type
  • typed property columns instead of one catch-all properties blob
  • explicit from_id and to_id foreign keys on edge tables
  • traversal-oriented indexes on generated edge tables

For a graph schema with node types User and Company, and an edge type WORKS_AT(User -> Company), the target SQLite contract looks like:

PRAGMA foreign_keys = ON;
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;

CREATE TABLE cg_node_user (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  age INTEGER
) STRICT;

CREATE TABLE cg_node_company (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL
) STRICT;

CREATE TABLE cg_edge_works_at (
  id INTEGER PRIMARY KEY,
  from_id INTEGER NOT NULL,
  to_id INTEGER NOT NULL,
  since INTEGER,
  FOREIGN KEY (from_id) REFERENCES cg_node_user(id) ON DELETE CASCADE,
  FOREIGN KEY (to_id) REFERENCES cg_node_company(id) ON DELETE CASCADE
) STRICT;

CREATE INDEX idx_cg_edge_works_at_from_id ON cg_edge_works_at(from_id);
CREATE INDEX idx_cg_edge_works_at_to_id ON cg_edge_works_at(to_id);
CREATE INDEX idx_cg_edge_works_at_from_to ON cg_edge_works_at(from_id, to_id);
CREATE INDEX idx_cg_edge_works_at_to_from ON cg_edge_works_at(to_id, from_id);

Recommended baseline indexes:

CREATE INDEX idx_cg_edge_works_at_from_id ON cg_edge_works_at(from_id);
CREATE INDEX idx_cg_edge_works_at_to_id ON cg_edge_works_at(to_id);
CREATE INDEX idx_cg_edge_works_at_from_to ON cg_edge_works_at(from_id, to_id);
CREATE INDEX idx_cg_edge_works_at_to_from ON cg_edge_works_at(to_id, from_id);

CypherGlot's schema contract is the generated type-aware layout rather than a generic nodes / edges / node_labels family.

That contract is intentionally performance-oriented and currently assumes one stored node type per physical node table row rather than native multi-label nodes. This is a storage-contract tradeoff, not a claim about Cypher semantics; broader label membership would require a different schema path.

See the dedicated guide for the full schema contract, column semantics, and indexing notes:

✅ Current status

CypherGlot targets a strong onboarding-oriented, read-heavy Neo4j subset with narrow write flows and bounded traversal support, not broad Neo4j parity.

Neo4j is the reference Cypher engine for this admitted surface. CypherGlot's SQL backends execute that subset through compilation, while direct Cypher runtimes outside Neo4j may still need light query adaptation in a few compatibility-path cases.

The public surface covers:

  • parsing through the vendored openCypher grammar
  • admitted-subset validation
  • normalization into repo-owned statement objects
  • compilation of admitted single-statement shapes into one SQLGlot Expression
  • compilation of admitted multi-step write shapes into a small SQL-backed program
  • thin SQL rendering helpers over the compiled output

The most useful admitted families are:

  • MATCH ... RETURN
  • narrow standalone OPTIONAL MATCH ... RETURN
  • narrow MATCH ... WITH ... RETURN
  • narrow standalone UNWIND ... RETURN
  • standalone CREATE
  • MATCH ... SET
  • MATCH ... DELETE
  • narrow MATCH ... CREATE relationship writes
  • grouped count(...), count(*), sum(...), avg(...), min(...), and max(...)
  • common scalar, predicate, graph-introspection, string, numeric, conversion, and narrow multi-argument computed projections over already admitted inputs

Vector-aware CALL db.index.vector.queryNodes(...) shapes are validated and normalized for host runtimes, but they are not compiled into SQLGlot output yet.

Benchmark evidence

Current checked-in benchmark artifacts live under scripts/benchmarks/results/. The current repo evidence set includes:

  • a checked-in compiler summary Markdown artifact
  • a checked-in large runtime matrix summary Markdown artifact across the current 11 backend/index paths
  • a checked-in repeated schema-shape summary Markdown artifact across the small, medium, and large presets

See the benchmark guide for methodology, result interpretation, and the important caveat that SQLite, DuckDB, and PostgreSQL are compile-plus-execute paths while Neo4j, ArcadeDB Embedded, and LadybugDB are direct Cypher runtimes.

Public API at a glance

The stable entrypoints are:

  • parse_cypher_text(text)
  • validate_cypher_text(text)
  • normalize_cypher_text(text)
  • graph_schema_from_text(text)
  • schema_ddl_from_text(text, backend)
  • to_sqlglot_ast(text)
  • to_sqlglot_program(text)
  • to_sql(text, dialect=...)
  • render_cypher_program_text(text, dialect=...)

Lower-level compile_*, normalize_*, and render_compiled_* helpers remain available for implementation-facing use.

Schema definition surface

CypherGlot now also accepts a small graph-native schema-definition surface above the raw GraphSchema(...) Python API. That lets hosts define graph types in graph terms and lower them through the same generated backend DDL path.

import cypherglot

schema = cypherglot.graph_schema_from_text(
  """
  CREATE NODE User (name STRING NOT NULL, age INTEGER);
  CREATE NODE Company (name STRING NOT NULL);
  CREATE EDGE WORKS_AT FROM User TO Company (since INTEGER);
  """
)

ddl = cypherglot.schema_ddl_from_text(
  """
  CREATE NODE User (name STRING NOT NULL, age INTEGER);
  CREATE NODE Company (name STRING NOT NULL);
  CREATE EDGE WORKS_AT FROM User TO Company (since INTEGER);
  CREATE INDEX user_name_idx ON NODE User(name);
  """,
  backend="sqlite",
)

CREATE INDEX is admitted only for workload-specific property indexes on typed node or edge properties. Baseline edge traversal indexes are still generated automatically and should not be re-declared through this surface.

Logging

CypherGlot uses the standard library logging module.

  • it stays silent by default
  • it does not configure the root logger
  • it installs a NullHandler on the cypherglot package logger so library use does not emit warnings or force host logging policy

When a host runtime wants compiler diagnostics, enable DEBUG on the cypherglot logger:

import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger("cypherglot").setLevel(logging.DEBUG)

Current level semantics:

  • DEBUG: parse, validate, normalize, compile, and render pipeline events, including schema-layout and dialect decisions at public entrypoints
  • INFO: reserved for explicit high-value lifecycle events; CypherGlot does not currently emit routine INFO logs
  • WARNING: reserved for degraded or compatibility-path behavior
  • ERROR: reserved for internal failures rather than ordinary admitted-subset rejection

Ordinary validation rejection remains an exception path, not an ERROR log.

🔗 Documentation

The admitted language boundary is documented in the docs site and kept honest by regression tests.

🧠 What is supported today

  • parsing through the vendored openCypher grammar
  • admitted-subset validation
  • normalization into repo-owned statement objects
  • compilation of admitted single-statement shapes into one SQLGlot Expression
  • compilation of admitted multi-step write shapes into a small SQL-backed program
  • thin SQL rendering helpers over the compiled output

Main admitted query families today:

  • MATCH ... RETURN
  • narrow standalone OPTIONAL MATCH ... RETURN
  • narrow MATCH ... WITH ... RETURN
  • narrow standalone UNWIND ... RETURN
  • standalone CREATE
  • MATCH ... SET
  • MATCH ... DELETE
  • narrow MATCH ... CREATE relationship writes
  • grouped count(...), count(*), sum(...), avg(...), min(...), and max(...)
  • common scalar, predicate, graph-introspection, string, numeric, conversion, and narrow multi-argument computed projections over already admitted inputs

That is intentionally a practical mainstream single-hop subset for onboarding, not a full Cypher compatibility claim.

⚡ Quick examples

Parse and validate one admitted read:

import cypherglot

text = "MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"

parsed = cypherglot.parse_cypher_text(text)
assert not parsed.has_errors

cypherglot.validate_cypher_text(text)
normalized = cypherglot.normalize_cypher_text(text)

print(type(normalized).__name__)

Compile a single-statement read to SQLGlot AST or SQL text:

expression = cypherglot.to_sqlglot_ast(
    "MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"
)

print(expression.sql())

sql = cypherglot.to_sql(
    "MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"
)

print(sql)

Compile a multi-step write shape to a SQL-backed program:

program = cypherglot.to_sqlglot_program(
    "MATCH (x:Begin) CREATE (x)-[:TYPE]->(:End {name: 'finish'})"
)

rendered = cypherglot.render_cypher_program_text(
    "MATCH (x:Begin) CREATE (x)-[:TYPE]->(:End {name: 'finish'})"
)

print(type(program).__name__)
print(rendered.steps[0])

Install

CypherGlot supports Python 3.10 and newer.

Install from PyPI:

uv pip install cypherglot

Install from source in editable mode:

uv pip install -e .

Development

Set up the local environment:

uv sync --group test --group docs

Run the tests:

uv run pytest

Run the PostgreSQL runtime suite against a disposable local container:

scripts/dev/run_postgresql_runtime_docker.sh

Check the generated frontend state:

scripts/dev/regenerate_cypher_frontend_docker.sh --check

Build the docs locally:

uv run mkdocs build --strict

🔗 Quick links

📦 Packaging

CypherGlot is a pure Python package today. It ships compiler code and generated frontend artifacts, but it does not ship a database runtime or platform-specific service layer.

The public package version comes from Git tags through Hatch VCS.

📄 License

CypherGlot is licensed under MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cypherglot-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cypherglot-0.1.0-py3-none-any.whl (213.5 kB view details)

Uploaded Python 3

File details

Details for the file cypherglot-0.1.0.tar.gz.

File metadata

  • Download URL: cypherglot-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cypherglot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 70efcab44b8b70af1ab7c2a30f02ac6913730d7f010a01b03421f98279360cae
MD5 7bfe6092bc06b830a95589074ef37b5a
BLAKE2b-256 0d7eb9643ab1fd08269ccfcd103629d835b63c522cad53a3aa31a2778e13fcd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cypherglot-0.1.0.tar.gz:

Publisher: publish-pypi.yml on humemai/cypherglot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cypherglot-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cypherglot-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 213.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cypherglot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdebc5ddf9411f0cacab46dc8bfa5f9c35dbf0a2c1f1ebd79be702a3f647cab9
MD5 32204bb46a044ae2b625b17647dc7bd5
BLAKE2b-256 4e69778a45c4de927581ac8c8e0c2e0b65b0200d80cfb7c2d83cd724d04a4ed3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cypherglot-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on humemai/cypherglot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page