Transpile Graph Query Language (openCypher) to Recursive SQL (Databricks)

These details have not been verified by PyPI

Project description

gsql2rsql

Query your Delta Tables as a Graph

No need for a separate graph database. Write intuitive OpenCypher queries, get Databricks SQL automatically.

!!! success "Why Databricks?" Databricks provides tables designed for massive scale, enabling efficient storage and querying of tens of billions of triples with features like time travel No ETL or migration needed—just query your data lake as a graph. Recently, Databricks released support for recursive queries, unlocking the use of SQL warehouses for graph-type queries.

Why gsql2rsql?

Challenge	Solution
Graph queries require complex SQL with `WITH RECURSIVE`	Write 5 lines of Cypher instead
Need to maintain a separate graph database	Query Delta Lake directly
LLM-generated complex SQL is hard to audit	Human-readable Cypher + deterministic transpilation (optionally pass to LLM for final optimization)
Scaling to tens of billions of triples is costly in graph DBs	Delta Lake stores billions of triples efficiently, with Spark scalability

See It in Action

pip install gsql2rsql

from gsql2rsql import GraphContext

# Point to your existing Delta tables - no migration needed
graph = GraphContext(
    nodes_table="catalog.fraud.nodes",
    edges_table="catalog.fraud.edges",
)

# Write graph queries with familiar Cypher syntax
sql = graph.transpile("""
    MATCH path = (origin:Person {id: 12345})-[:TRANSACTION*1..4]->(dest:Person)
    WHERE dest.risk_score > 0.8
    RETURN dest.id, dest.name, dest.risk_score, length(path) AS depth
    ORDER BY depth, dest.risk_score DESC
    LIMIT 100
""")

print(sql)  # Production-ready Databricks SQL

5 lines of Cypher → optimized Databricks SQL with recursive CTEs

??? example "Click to see the generated SQL"

```sql
WITH RECURSIVE
  paths_1 AS (
    -- Base case: direct edges (depth = 1)
    SELECT
      e.src AS start_node,
      e.dst AS end_node,
      1 AS depth,
      ARRAY(e.src, e.dst) AS path,
      ARRAY(NAMED_STRUCT('src', e.src, 'dst', e.dst, 'amount', e.amount, 'timestamp', e.timestamp)) AS path_edges,
      ARRAY(e.src) AS visited
    FROM catalog.fraud.edges e
    JOIN catalog.fraud.nodes src ON src.id = e.src
    WHERE (relationship_type = 'TRANSACTION') AND (src.id) = (12345)

    UNION ALL

    -- Recursive case: extend paths
    SELECT
      p.start_node,
      e.dst AS end_node,
      p.depth + 1 AS depth,
      CONCAT(p.path, ARRAY(e.dst)) AS path,
      ARRAY_APPEND(p.path_edges, NAMED_STRUCT('src', e.src, 'dst', e.dst, 'amount', e.amount, 'timestamp', e.timestamp)) AS path_edges,
      CONCAT(p.visited, ARRAY(e.src)) AS visited
    FROM paths_1 p
    JOIN catalog.fraud.edges e
      ON p.end_node = e.src
    WHERE p.depth < 4
      AND NOT ARRAY_CONTAINS(p.visited, e.dst)
      AND (relationship_type = 'TRANSACTION')
  )
SELECT
   _gsql2rsql_dest_id AS id
  ,_gsql2rsql_dest_name AS name
  ,_gsql2rsql_dest_risk_score AS risk_score
  ,(SIZE(_gsql2rsql_path_id) - 1) AS depth
FROM (
  SELECT
     sink.id AS _gsql2rsql_dest_id
    ,sink.name AS _gsql2rsql_dest_name
    ,sink.risk_score AS _gsql2rsql_dest_risk_score
    ,source.id AS _gsql2rsql_origin_id
    ,p.path AS _gsql2rsql_path_id
    ,p.path_edges AS _gsql2rsql_path_edges
  FROM paths_1 p
  JOIN catalog.fraud.nodes sink ON sink.id = p.end_node
  JOIN catalog.fraud.nodes source ON source.id = p.start_node
  WHERE p.depth >= 1 AND p.depth <= 4 AND (sink.risk_score) > (0.8)
) AS _proj
ORDER BY depth ASC, _gsql2rsql_dest_risk_score DESC
LIMIT 100
```

!!! warning "Not for OLTP (obviously) or end-user queries" This transpiler is for internal analytics and exploration (data science, engineering, analysis). It obviously makes no sense for OLTP ! If you plan to expose transpiled queries to end users, be careful: implement validation, rate limiting, and security. Use common sense.

Real-World Examples

=== "Fraud Detection"

```cypher
-- Find fraud rings: accounts connected through suspicious transactions
MATCH (a:Account)-[:TRANSFER*2..4]->(b:Account)
WHERE a.flagged = true AND b.flagged = true
RETURN DISTINCT a.id, b.id, length(path) AS hops
```

[See more fraud detection queries →](examples/fraud.md)

=== "Credit Analysis"

```cypher
-- Analyze credit exposure through guarantor chains
MATCH path = (borrower:Customer)-[:GUARANTEES*1..3]->(guarantor:Customer)
WHERE borrower.credit_score < 600
RETURN borrower.id, COLLECT(guarantor.id) AS chain
```

[See more credit analysis queries →](examples/credit.md)

=== "Social Network"

```cypher
-- Friends of friends who work at tech companies
MATCH (me:Person {id: 123})-[:KNOWS*1..2]->(friend)-[:WORKS_AT]->(c:Company)
WHERE c.industry = 'Technology'
RETURN DISTINCT friend.name, c.name
```

[See all feature examples →](examples/features.md)

That's it! No schema boilerplate, no complex setup.

Full User Guide →

Low-Level API (Without GraphContext)

For advanced use cases or non-Triple-Store schemas, use the components directly:

from gsql2rsql import OpenCypherParser, LogicalPlan, SQLRenderer
from gsql2rsql.common.schema import NodeSchema, EdgeSchema, EntityProperty
from gsql2rsql.renderer.schema_provider import SimpleSQLSchemaProvider, SQLTableDescriptor

# 1. Define schema (SimpleSQLSchemaProvider)
schema = SimpleSQLSchemaProvider()

person = NodeSchema(
    name="Person",
    node_id_property=EntityProperty("id", int),
    properties=[EntityProperty("name", str)],
)
schema.add_node(
    person,
    SQLTableDescriptor(table_name="people", node_id_columns=["id"]),
)

knows = EdgeSchema(
    name="KNOWS",
    source_node_id="Person",
    sink_node_id="Person",
)
schema.add_edge(
    knows,
    SQLTableDescriptor(table_name="friendships"),
)

# 2. Transpile
parser = OpenCypherParser()
ast = parser.parse("MATCH (p:Person)-[:KNOWS]->(f:Person) RETURN p.name, f.name")
plan = LogicalPlan.process_query_tree(ast, schema)
plan.resolve(original_query="...")

renderer = SQLRenderer(db_schema_provider=schema)
sql = renderer.render_plan(plan)

API Reference →

Key Features

Feature	Description
Variable-length paths	`[:REL*1..5]` via `WITH RECURSIVE`
Cycle detection	Automatic `ARRAY_CONTAINS` checks
Path functions	`length(path)`, `nodes(path)`, `relationships(path)`
No-label nodes	`(a)-[:REL]->(b:Label)` matches any node type for `a`
Inline filters	`(n:Person {id: 123})` pushes predicates to source
Undirected edges	`(a)-[:KNOWS]-(b)` via optimized UNION ALL
Aggregations	COUNT, SUM, AVG, COLLECT, etc.
Type safety	Schema validation before SQL generation

Architecture

gsql2rsql uses a 4-phase pipeline for correctness:

OpenCypher → Parser → Planner → Resolver → Renderer → SQL

Parser: Cypher → AST (syntax only, no schema)
Planner: AST → Logical operators (semantics)
Resolver: Validate columns & types against schema
Renderer: Operators → Databricks SQL

This separation ensures each phase has clear responsibilities and can be tested independently.

Architecture details →

Documentation

Section	Description
User Guide	Getting started, GraphContext, schema setup
Examples	69 complete queries with generated SQL
Architecture	How the transpiler works

Project Status

!!! info "Research Project" Contributions welcome!

License

MIT License - see LICENSE

--8<-- "inspiration.md"

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.10.0

Feb 24, 2026

0.10.0.dev202603200846 pre-release

Mar 20, 2026

0.9.7

Feb 22, 2026

0.9.6

Feb 18, 2026

0.9.5

Feb 18, 2026

0.9.4

Feb 7, 2026

0.9.3

Feb 3, 2026

0.9.2

Feb 3, 2026

0.9.1

Feb 3, 2026

0.9.0

Feb 2, 2026

0.8.2

Jan 31, 2026

0.8.1

Jan 31, 2026

0.8.0

Jan 26, 2026

0.7.3

Jan 24, 2026

This version

0.7.2

Jan 24, 2026

0.7.1

Jan 24, 2026

0.7.0

Jan 24, 2026

0.6.0

Jan 22, 2026

0.5.0

Jan 21, 2026

0.4.1

Jan 21, 2026

0.4.0

Jan 21, 2026

0.3.0

Jan 21, 2026

0.2.0

Jan 20, 2026

0.1.7

Jan 20, 2026

0.1.6

Jan 20, 2026

0.1.5

Jan 20, 2026

0.1.4

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsql2rsql-0.7.2.tar.gz (2.5 MB view details)

Uploaded Jan 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gsql2rsql-0.7.2-py3-none-any.whl (273.0 kB view details)

Uploaded Jan 24, 2026 Python 3

File details

Details for the file gsql2rsql-0.7.2.tar.gz.

File metadata

Download URL: gsql2rsql-0.7.2.tar.gz
Upload date: Jan 24, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gsql2rsql-0.7.2.tar.gz
Algorithm	Hash digest
SHA256	`2878a6c7ae9076e5f9875eed3ebb2cc2cbfacdfaead323dcb9922b5799f8224f`
MD5	`94aabb8edfc00c9b14209bf6a72ba716`
BLAKE2b-256	`2fd3b501aa82c9a6f99ee72a6d9dce9fa9a5a05630d297aca2b3ce809d052c28`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsql2rsql-0.7.2.tar.gz:

Publisher: release.yml on devmessias/gsql2rsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gsql2rsql-0.7.2.tar.gz
- Subject digest: 2878a6c7ae9076e5f9875eed3ebb2cc2cbfacdfaead323dcb9922b5799f8224f
- Sigstore transparency entry: 850156740
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: devmessias/gsql2rsql@f79f1b9f367d9b8e04990f373b04ba9e6faa5b9c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/devmessias
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f79f1b9f367d9b8e04990f373b04ba9e6faa5b9c
- Trigger Event: push

File details

Details for the file gsql2rsql-0.7.2-py3-none-any.whl.

File metadata

Download URL: gsql2rsql-0.7.2-py3-none-any.whl
Upload date: Jan 24, 2026
Size: 273.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gsql2rsql-0.7.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e059d71cac5fb535f492417fc5d17b123e3276f8209f6776099775b5cdb404d`
MD5	`044973728f0ce4872989659c5598e2b1`
BLAKE2b-256	`9f79b8b3a59f0ddae402a92864b836533541f403e37c61e5b1f204ccc23fc6dd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsql2rsql-0.7.2-py3-none-any.whl:

Publisher: release.yml on devmessias/gsql2rsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gsql2rsql-0.7.2-py3-none-any.whl
- Subject digest: 2e059d71cac5fb535f492417fc5d17b123e3276f8209f6776099775b5cdb404d
- Sigstore transparency entry: 850156742
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: devmessias/gsql2rsql@f79f1b9f367d9b8e04990f373b04ba9e6faa5b9c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/devmessias
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f79f1b9f367d9b8e04990f373b04ba9e6faa5b9c
- Trigger Event: push

gsql2rsql 0.7.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

gsql2rsql

Why gsql2rsql?

See It in Action

Real-World Examples

Low-Level API (Without GraphContext)

Key Features

Architecture

Documentation

Project Status

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance