Cypher frontend compiler that lowers Neo4j-like Cypher into SQLGlot AST
Project description
CypherGlot
CypherGlot is the Cypher frontend compiler for the HumemAI stack.
✨ What CypherGlot is
It takes Neo4j-like Cypher, enforces an explicit admitted subset, normalizes the accepted shape, and lowers it into SQLGlot-backed output that another runtime can plan and execute.
The compatibility target is Neo4j-valid first: admitted queries should ideally be valid on Neo4j unchanged, while other Cypher runtimes such as ArcadeDB and Ladybug may still require small compatibility rewrites around that same subset.
raw Cypher string
→ parse
→ validate admitted subset
→ normalize
→ graph-relational IR
→ backend-aware lowering
→ SQLGlot AST or SQL-backed program
CypherGlot is intentionally compiler-only.
- It parses and lowers Cypher.
- It does not execute SQL.
- It does not own storage.
- It does not execute vector search.
🎯 What it is for
- a reusable Cypher frontend compiler
- a stable boundary between Cypher parsing and host-runtime execution
- SQLGlot-backed output for embedded runtimes such as HumemDB
Current backend direction
CypherGlot targets equal multi-dialect SQL support through a backend-neutral IR plus backend-aware lowering.
- the intended compiler path is now
Cypher AST -> normalize -> graph-relational IR -> backend-aware lowering -> SQLGlot AST/program -> SQL - SQLite has an executable lowering path through the shared IR
- DuckDB now has an explicit lowering path from the same shared architecture; support claims remain strict
- PostgreSQL is part of the same IR-based backend path
dialect=...rendering support remains useful for string output experiments and host integration work, but rendering alone is still not a portability guarantee- a backend counts as supported only when admitted Cypher shapes execute correctly against that backend's schema and runtime contract
HumemDB is the main reference host runtime for the current SQLite-backed execution contract.
Graph-to-table schema contract
CypherGlot’s output is schema-aware. If you want to execute its compiled SQL, your runtime needs to provide the graph-to-table layout that the compiler expects.
CypherGlot uses a generated type-aware schema contract.
The target contract is:
- one table per node type
- one table per edge type
- typed property columns instead of one catch-all
propertiesblob - explicit
from_idandto_idforeign keys on edge tables - traversal-oriented indexes on generated edge tables
For a graph schema with node types User and Company, and an edge type
WORKS_AT(User -> Company), the target SQLite contract looks like:
PRAGMA foreign_keys = ON;
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
CREATE TABLE cg_node_user (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
age INTEGER
) STRICT;
CREATE TABLE cg_node_company (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
) STRICT;
CREATE TABLE cg_edge_works_at (
id INTEGER PRIMARY KEY,
from_id INTEGER NOT NULL,
to_id INTEGER NOT NULL,
since INTEGER,
FOREIGN KEY (from_id) REFERENCES cg_node_user(id) ON DELETE CASCADE,
FOREIGN KEY (to_id) REFERENCES cg_node_company(id) ON DELETE CASCADE
) STRICT;
CREATE INDEX idx_cg_edge_works_at_from_id ON cg_edge_works_at(from_id);
CREATE INDEX idx_cg_edge_works_at_to_id ON cg_edge_works_at(to_id);
CREATE INDEX idx_cg_edge_works_at_from_to ON cg_edge_works_at(from_id, to_id);
CREATE INDEX idx_cg_edge_works_at_to_from ON cg_edge_works_at(to_id, from_id);
Recommended baseline indexes:
CREATE INDEX idx_cg_edge_works_at_from_id ON cg_edge_works_at(from_id);
CREATE INDEX idx_cg_edge_works_at_to_id ON cg_edge_works_at(to_id);
CREATE INDEX idx_cg_edge_works_at_from_to ON cg_edge_works_at(from_id, to_id);
CREATE INDEX idx_cg_edge_works_at_to_from ON cg_edge_works_at(to_id, from_id);
CypherGlot's schema contract is the generated type-aware layout rather than a
generic nodes / edges / node_labels family.
That contract is intentionally performance-oriented and currently assumes one stored node type per physical node table row rather than native multi-label nodes. This is a storage-contract tradeoff, not a claim about Cypher semantics; broader label membership would require a different schema path.
See the dedicated guide for the full schema contract, column semantics, and indexing notes:
✅ Current status
CypherGlot targets a strong onboarding-oriented, read-heavy Neo4j subset with narrow write flows and bounded traversal support, not broad Neo4j parity.
Neo4j is the reference Cypher engine for this admitted surface. CypherGlot's SQL backends execute that subset through compilation, while direct Cypher runtimes outside Neo4j may still need light query adaptation in a few compatibility-path cases.
The public surface covers:
- parsing through the vendored openCypher grammar
- admitted-subset validation
- normalization into repo-owned statement objects
- compilation of admitted single-statement shapes into one SQLGlot
Expression - compilation of admitted multi-step write shapes into a small SQL-backed program
- thin SQL rendering helpers over the compiled output
The most useful admitted families are:
MATCH ... RETURN- narrow standalone
OPTIONAL MATCH ... RETURN - narrow
MATCH ... WITH ... RETURN - narrow standalone
UNWIND ... RETURN - standalone
CREATE MATCH ... SETMATCH ... DELETE- narrow
MATCH ... CREATErelationship writes - grouped
count(...),count(*),sum(...),avg(...),min(...), andmax(...) - common scalar, predicate, graph-introspection, string, numeric, conversion, and narrow multi-argument computed projections over already admitted inputs
Vector-aware CALL db.index.vector.queryNodes(...) shapes are validated and
normalized for host runtimes, but they are not compiled into SQLGlot output yet.
Benchmark evidence
Current checked-in benchmark artifacts live under scripts/benchmarks/results/.
The current repo evidence set includes:
- a checked-in compiler summary Markdown artifact
- a checked-in large runtime matrix summary Markdown artifact across the current
11backend/index paths - a checked-in repeated schema-shape summary Markdown artifact across the small, medium, and large presets
See the benchmark guide for methodology, result interpretation, and the important caveat that SQLite, DuckDB, and PostgreSQL are compile-plus-execute paths while Neo4j, ArcadeDB Embedded, and LadybugDB are direct Cypher runtimes.
Public API at a glance
The stable entrypoints are:
parse_cypher_text(text)validate_cypher_text(text)normalize_cypher_text(text)graph_schema_from_text(text)schema_ddl_from_text(text, backend)to_sqlglot_ast(text)to_sqlglot_program(text)to_sql(text, dialect=...)render_cypher_program_text(text, dialect=...)
Lower-level compile_*, normalize_*, and render_compiled_* helpers remain
available for implementation-facing use.
Schema definition surface
CypherGlot now also accepts a small graph-native schema-definition surface above
the raw GraphSchema(...) Python API. That lets hosts define graph types in
graph terms and lower them through the same generated backend DDL path.
import cypherglot
schema = cypherglot.graph_schema_from_text(
"""
CREATE NODE User (name STRING NOT NULL, age INTEGER);
CREATE NODE Company (name STRING NOT NULL);
CREATE EDGE WORKS_AT FROM User TO Company (since INTEGER);
"""
)
ddl = cypherglot.schema_ddl_from_text(
"""
CREATE NODE User (name STRING NOT NULL, age INTEGER);
CREATE NODE Company (name STRING NOT NULL);
CREATE EDGE WORKS_AT FROM User TO Company (since INTEGER);
CREATE INDEX user_name_idx ON NODE User(name);
""",
backend="sqlite",
)
CREATE INDEX is admitted only for workload-specific property indexes on typed
node or edge properties. Baseline edge traversal indexes are still generated
automatically and should not be re-declared through this surface.
Logging
CypherGlot uses the standard library logging module.
- it stays silent by default
- it does not configure the root logger
- it installs a
NullHandleron thecypherglotpackage logger so library use does not emit warnings or force host logging policy
When a host runtime wants compiler diagnostics, enable DEBUG on the
cypherglot logger:
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("cypherglot").setLevel(logging.DEBUG)
Current level semantics:
DEBUG: parse, validate, normalize, compile, and render pipeline events, including schema-layout and dialect decisions at public entrypointsINFO: reserved for explicit high-value lifecycle events; CypherGlot does not currently emit routineINFOlogsWARNING: reserved for degraded or compatibility-path behaviorERROR: reserved for internal failures rather than ordinary admitted-subset rejection
Ordinary validation rejection remains an exception path, not an ERROR log.
🔗 Documentation
The admitted language boundary is documented in the docs site and kept honest by regression tests.
🧠 What is supported today
- parsing through the vendored openCypher grammar
- admitted-subset validation
- normalization into repo-owned statement objects
- compilation of admitted single-statement shapes into one SQLGlot
Expression - compilation of admitted multi-step write shapes into a small SQL-backed program
- thin SQL rendering helpers over the compiled output
Main admitted query families today:
MATCH ... RETURN- narrow standalone
OPTIONAL MATCH ... RETURN - narrow
MATCH ... WITH ... RETURN - narrow standalone
UNWIND ... RETURN - standalone
CREATE MATCH ... SETMATCH ... DELETE- narrow
MATCH ... CREATErelationship writes - grouped
count(...),count(*),sum(...),avg(...),min(...), andmax(...) - common scalar, predicate, graph-introspection, string, numeric, conversion, and narrow multi-argument computed projections over already admitted inputs
That is intentionally a practical mainstream single-hop subset for onboarding, not a full Cypher compatibility claim.
⚡ Quick examples
Parse and validate one admitted read:
import cypherglot
text = "MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"
parsed = cypherglot.parse_cypher_text(text)
assert not parsed.has_errors
cypherglot.validate_cypher_text(text)
normalized = cypherglot.normalize_cypher_text(text)
print(type(normalized).__name__)
Compile a single-statement read to SQLGlot AST or SQL text:
expression = cypherglot.to_sqlglot_ast(
"MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"
)
print(expression.sql())
sql = cypherglot.to_sql(
"MATCH (u:User) WHERE u.name = $name RETURN u.name ORDER BY u.name LIMIT 1"
)
print(sql)
Compile a multi-step write shape to a SQL-backed program:
program = cypherglot.to_sqlglot_program(
"MATCH (x:Begin) CREATE (x)-[:TYPE]->(:End {name: 'finish'})"
)
rendered = cypherglot.render_cypher_program_text(
"MATCH (x:Begin) CREATE (x)-[:TYPE]->(:End {name: 'finish'})"
)
print(type(program).__name__)
print(rendered.steps[0])
Install
CypherGlot supports Python 3.10 and newer.
Install from PyPI:
uv pip install cypherglot
Install from source in editable mode:
uv pip install -e .
Development
Set up the local environment:
uv sync --group test --group docs
Run the tests:
uv run pytest
Run the PostgreSQL runtime suite against a disposable local container:
scripts/dev/run_postgresql_runtime_docker.sh
Check the generated frontend state:
scripts/dev/regenerate_cypher_frontend_docker.sh --check
Build the docs locally:
uv run mkdocs build --strict
🔗 Quick links
- Docs: docs.humem.ai/cypherglot
- Repository: github.com/humemai/cypherglot
- Issues: github.com/humemai/cypherglot/issues
📦 Packaging
CypherGlot is a pure Python package today. It ships compiler code and generated frontend artifacts, but it does not ship a database runtime or platform-specific service layer.
The public package version comes from Git tags through Hatch VCS.
📄 License
CypherGlot is licensed under MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cypherglot-0.1.0.tar.gz.
File metadata
- Download URL: cypherglot-0.1.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70efcab44b8b70af1ab7c2a30f02ac6913730d7f010a01b03421f98279360cae
|
|
| MD5 |
7bfe6092bc06b830a95589074ef37b5a
|
|
| BLAKE2b-256 |
0d7eb9643ab1fd08269ccfcd103629d835b63c522cad53a3aa31a2778e13fcd4
|
Provenance
The following attestation bundles were made for cypherglot-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on humemai/cypherglot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cypherglot-0.1.0.tar.gz -
Subject digest:
70efcab44b8b70af1ab7c2a30f02ac6913730d7f010a01b03421f98279360cae - Sigstore transparency entry: 1563918965
- Sigstore integration time:
-
Permalink:
humemai/cypherglot@8e4fcaf200be3cbfad963c96a2281e9d8d026882 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/humemai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@8e4fcaf200be3cbfad963c96a2281e9d8d026882 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cypherglot-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cypherglot-0.1.0-py3-none-any.whl
- Upload date:
- Size: 213.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdebc5ddf9411f0cacab46dc8bfa5f9c35dbf0a2c1f1ebd79be702a3f647cab9
|
|
| MD5 |
32204bb46a044ae2b625b17647dc7bd5
|
|
| BLAKE2b-256 |
4e69778a45c4de927581ac8c8e0c2e0b65b0200d80cfb7c2d83cd724d04a4ed3
|
Provenance
The following attestation bundles were made for cypherglot-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on humemai/cypherglot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cypherglot-0.1.0-py3-none-any.whl -
Subject digest:
cdebc5ddf9411f0cacab46dc8bfa5f9c35dbf0a2c1f1ebd79be702a3f647cab9 - Sigstore transparency entry: 1563918968
- Sigstore integration time:
-
Permalink:
humemai/cypherglot@8e4fcaf200be3cbfad963c96a2281e9d8d026882 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/humemai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@8e4fcaf200be3cbfad963c96a2281e9d8d026882 -
Trigger Event:
push
-
Statement type: