Graph-based schema analysis for text-to-SQL. Build schema graphs, rank relevant nodes with Personalized PageRank, and extract LLM-ready context. No LLM dependency — bring your own model.

These details have not been verified by PyPI

Project links

Repository

Project description

graph2sql

Stop dumping your entire database schema into LLM prompts.

When you ask an AI to write SQL, you usually paste the full schema and hope it figures out which tables matter. That works for small databases — but as your schema grows, the noise drowns out the signal, token costs rise, and accuracy drops.

graph2sql solves this by building a schema graph and using Personalized PageRank to find exactly which tables and relationships are relevant to your question — then hands that focused context to whichever LLM you choose.

No LLM included. Bring your own model.

How it works

flowchart LR
    A["🗣️ Natural language\nquestion"] --> C
    B["🗄️ Schema graph\ntables · columns · relationships"] --> C
    C["Personalized\nPageRank"] --> D["Ranked subgraph\ntop-k nodes + 1-hop neighbours"]
    D --> E["Structured\ncontext"]
    E --> F["Your LLM"]
    F --> G["✅ SQL"]

    style C fill:#6366f1,color:#fff,stroke:none
    style F fill:#10b981,color:#fff,stroke:none
    style G fill:#f59e0b,color:#fff,stroke:none

You describe your schema as a graph — tables and columns as nodes, foreign keys as edges
You ask a question in plain English
graph2sql ranks the most relevant nodes using Personalized PageRank
You get a clean subgraph — not the whole schema, just what matters
Feed it to any LLM and get better SQL with fewer tokens

Install

pip install graph2sql

Quick start

from graph2sql import SchemaGraph

# Describe your schema as a graph
graph = SchemaGraph()
graph.add_node("users",   "users",   content="id, name, email, country")
graph.add_node("orders",  "orders",  content="id, customer_id, total, created_at")
graph.add_node("products","products",content="id, name, price, category")
graph.add_node("order_items","order_items",content="id, order_id, product_id, quantity")

graph.add_edge("orders",      "users",    "belongs_to")
graph.add_edge("order_items", "orders",   "belongs_to")
graph.add_edge("order_items", "products", "references")

# Ask a question — get back only the relevant tables
context = graph.rank("total revenue by customer last month", k=3)

# Pass context["nodes"] and context["edges"] to your LLM prompt

The context dict contains only the tables and relationships relevant to your question, with a relevance score on each top-k node. Feed it to GPT-4, Llama, Qwen — whatever you use.

Load from an existing dict

If you already store your schema as JSON or a dict, you can load it directly:

graph = SchemaGraph.from_dict({
    "nodes": [
        {"id": "users",  "label": "users",  "content": "id, name, email"},
        {"id": "orders", "label": "orders", "content": "id, customer_id, total"},
    ],
    "edges": [
        {"from": "orders", "to": "users", "label": "belongs_to"}
    ]
})

Run the example

git clone https://github.com/jw-open/graph2sql
cd graph2sql
pip install -e ".[dev]"
python examples/ecommerce.py

Schema reference

Nodes

Each node represents a table, column, view, or any named schema entity.

Field	Type	Required	Description
`id`	str	yes	Unique identifier
`label`	str	yes	Name used for query matching (table name, column name)
`content`	str	no	Column definitions, DDL, or a plain description
`attributes`	dict	no	Type, database, aliases, cardinality hints — see below

Common attributes:

graph.add_node("orders", "orders",
    content="id INT PK, customer_id INT FK, total DECIMAL, created_at TIMESTAMP",
    attributes={
        "type": "table",           # table | column | view | index
        "database": "postgres",    # mysql | postgres | sqlite | bigquery | snowflake | ...
        "alias": "transactions",   # alternative name matched against queries
        "primary_key": "id",
    }
)

Edges

Each edge represents a relationship between two nodes.

Field	Type	Description
`"from"`	str	Source node id
`"to"`	str	Target node id
`"label"`	str	Relationship type (`"belongs_to"`, `"foreign_key"`, `"references"`, ...)

With relationship metadata:

# Add cardinality and join hint — helps the LLM write better JOINs
graph.add_edge("orders", "users", "belongs_to")
# Extended form with attributes:
{
    "from": "orders", "to": "users", "label": "belongs_to",
    "attributes": {
        "cardinality": "many_to_one",                   # one_to_one | one_to_many | many_to_one | many_to_many
        "join": "orders.customer_id = users.id",        # explicit JOIN hint for the LLM
    }
}

# Many-to-many via junction table
{
    "from": "orders", "to": "products", "label": "many_to_many",
    "attributes": {
        "cardinality": "many_to_many",
        "via": "order_items",
        "join": "orders.id = order_items.order_id AND order_items.product_id = products.id",
    }
}

API

from graph2sql import SchemaGraph

g = SchemaGraph()
g.add_node(id, label, content=None, attributes=None)  # returns self (chainable)
g.add_edge(from_id, to_id, label)                     # returns self (chainable)
g.rank(query, k=3, alpha=0.85)                        # returns {"nodes": [...], "edges": [...]}
g.to_dict()                                           # returns raw graph dict
SchemaGraph.from_dict(graph_dict)                     # load from existing dict

Known limitations

Matching is exact word-level. The algorithm matches words in your question against node labels — "customer" won't match a node labeled "users".

Fix: use alias in attributes. Any string attribute value is also matched against your query:

graph.add_node("users", "users",
    content="id, name, email",
    attributes={"alias": "customers clients members"}
)
# Now "customers" in a query matches this node

Benchmarks

Evaluation against BIRD-SQL and Spider is planned for v0.2.0.

The goal: show that graph2sql-ranked context achieves comparable SQL accuracy to full-schema prompting while using significantly fewer tokens.

Design goals

Pure Python + numpy — no LLM, no database, no cloud
Works with any model (GPT-4, Llama, Qwen, Claude, Mistral...)
rank() returns a plain dict — serialize it however you want
No FastAPI, MongoDB, Redis, or infra dependencies

Run tests

pytest tests/

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.2.0

May 3, 2026

0.1.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph2sql-0.2.0.tar.gz (26.5 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graph2sql-0.2.0-py3-none-any.whl (21.6 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file graph2sql-0.2.0.tar.gz.

File metadata

Download URL: graph2sql-0.2.0.tar.gz
Upload date: May 3, 2026
Size: 26.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for graph2sql-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c52083532fb4e75b30442b0798993e0e8f59d3b0ead9b2edc9a6e357e8595812`
MD5	`475cb3d281f79a615e00fb98db347bb0`
BLAKE2b-256	`4db28031ad153c752295a118e66117b39eb88c79d11104906b848b6485c23ff7`

See more details on using hashes here.

File details

Details for the file graph2sql-0.2.0-py3-none-any.whl.

File metadata

Download URL: graph2sql-0.2.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for graph2sql-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cc8dda45e176351e532c624c8bdb1802d2e73b30690204b4d0994ad5129cc41`
MD5	`d34df7b439f881c3ef5edac5e3a57c1b`
BLAKE2b-256	`42ae477168878b2e316c279036de3a4869440d66c68bfc99dd529f24539d0f62`

See more details on using hashes here.

graph2sql 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

graph2sql

How it works

Install

Quick start

Load from an existing dict

Run the example

Schema reference

Nodes

Edges

API

Known limitations

Benchmarks

Design goals

Run tests

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes