A lean GraphRAG library using Postgres/pgvector as the sole database
Project description
🏗️ Early version. Interface can evolve quickly. Star the repo to be updated about new changes, as we work our way through the roadmap.
postgres-graph-rag 🐘🕸️
High-Precision GraphRAG. Native to PostgreSQL.
Most RAG systems are Flatlanders. They use vector similarity to find related text, but they are fundamentally blind to relationships. If you ask your RAG "How is Person A connected to Project B through their shared dependencies?", standard vector search fails because the answer isn't in a single chunk—it’s in the links between them.
Postgres Graph RAG bridges this reasoning gap by turning your existing PostgreSQL database into a structured knowledge engine.
Why this exists:
- Infrastructure Nightmare: Building "Smart RAG" usually means adding a Graph DB (Neo4j) to your stack. Now you have a distributed systems nightmare: keeping your Relational DB, Vector DB, and Graph DB in sync.
- Flatland Problem: Vector similarity is just probabilistic matching. It doesn't understand hierarchy, causality, or directed relationships (e.g., "A leads B" vs "B leads A").
- Batch Bottleneck: Existing GraphRAG research (like Microsoft's) is batch-heavy and token-expensive. It can't handle real-time, incremental updates.
Postgres-Native Solution:
This library is built for Postgres Maximalists. It leverages the engine you already trust to do the heavy lifting:
- Recursive Retrieval: Instead of expensive LLM-agent loops, we use SQL Recursive CTEs to perform multi-hop reasoning. It’s deterministic, 10x faster, and handles "neighbor-of-neighbor" walks natively.
- Atomic Consistency: Vectors, nodes, and relationships live in one ACID-compliant engine. One transaction. Zero sync lag.
- Forever Schema: Using
JSONBmetadata and a namespaced design, the schema is migration-proof. You can evolve your graph's logic without ever runningALTER TABLE.
Core Philosophy
- Infrastructure: Postgres is the only database (via
pgvector). - Intelligence: Hosted SLMs (GPT-5.2 or Gemini 2.5) for extraction. Freely configurable.
- Simplicity: Native Async Python + SQL.
- Scalability: High-performance connection pooling and namespace-aware design (Multi-tenancy).
Installation
pip install postgres-graph-rag
Getting Started (Interactive & Frictionless)
The library is designed to be interactive-friendly. You can instantiate it normally and use await at the top level in Notebooks or REPLs. The database connection pool is initialized lazily upon the first request.
from postgres_graph_rag import PostgresGraphRAG
# 1. Simple Instantiation
rag = PostgresGraphRAG(
postgres_url="postgresql://user:password@localhost:5432/dbname",
openai_api_key="sk-..." # Or use google_api_key
)
async def quick_start():
# 2. Setup (Creates tables and pgvector extension if missing)
await rag.setup()
# 3. Add Knowledge (Atomic upserts with automatic entity resolution)
await rag.add_texts(
"Johny Srouji leads the hardware team at Apple.",
namespace="apple_research"
)
# 4. Hybrid Query (Vector Search + Recursive Graph Traversal)
context = await rag.query(
"Who is leading the hardware efforts?",
namespace="apple_research",
hops=2
)
print(context)
# 5. Cleanup (Closes the connection pool)
await rag.close()
Advanced Usage & Modes
1. The Production Way: Async Context Manager
For applications (like FastAPI or background workers), use the async with pattern to ensure the connection pool is always closed correctly, even if errors occur.
async with PostgresGraphRAG(postgres_url=DSN, openai_api_key=KEY) as rag:
await rag.add_texts("The M4 chip uses ARM architecture.")
# No need to call rag.close(), it happens automatically!
2. Custom Chunking (Inversion of Control)
Don't like the default character splitter? Inject your own. You can pass any callable that takes a string and returns a list of strings.
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Create your favorite chunker
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
# Inject it into the library
rag = PostgresGraphRAG(
postgres_url=DSN,
openai_api_key=KEY,
chunker=splitter.split_text # Just pass the method
)
3. Custom Provider Configuration
You can control exactly which models are used for extraction and embeddings.
from postgres_graph_rag.models import ProviderConfig
custom_config: ProviderConfig = {
"extraction_model": "gpt-5-nano-2025-08-07",
"embedding_model": "text-embedding-3-large",
"dimension": 3072 # Must match the model's output
}
rag = PostgresGraphRAG(..., config=custom_config)
4. Multi-Tenancy (Namespacing)
Isolate data for different users or projects within the same database tables.
# User A's private graph
await rag.add_texts("My secret key is 123.", namespace="user_a")
# User B's private graph
await rag.add_texts("My secret key is 999.", namespace="user_b")
# Queries are strictly isolated
res = await rag.query("What is my key?", namespace="user_a") # Returns 123
🗺️ Roadmap & Future Vision
This project follows the "Postgres Maximalism" philosophy: Stop building new infrastructure and start using the full power of the database you already own.
✅ Phase 1: Foundation (Current Release)
- Postgres-Native Schema: Migration-proof design using JSONB and namespacing.
- Recursive Reasoning: Multi-hop graph traversal implemented via SQL Recursive CTEs.
- Incremental Ingestion: Atomic upserts for nodes and edges (no expensive batch rebuilds).
- Hosted SLM Extraction: Native support for OpenAI and Google Gemini for high-speed, low-cost tagging.
- Async Architecture: Production-ready with high-performance connection pooling.
🏗️ Phase 2: High-Precision Retrieval (Next Up)
- Hybrid Search (BM25 + Vector): Integrate Postgres Full-Text Search with
pgvector. Keyword precision meets semantic depth. - Advanced Entity Resolution (ER): Automatic merging of similar entities (e.g., "Elon" and "Elon Musk") using
pg_trgmfuzzy matching and vector distance during ingestion. - Relationship Scoring: Dynamic edge weighting based on mention frequency and extraction confidence scores.
- Metadata Pruning: Filter graph paths based on metadata (e.g., "Only traverse relationships from documents updated in the last 90 days").
📊 Phase 3: Global Intelligence & Scaling
- SQL-Native Community Detection: Implement clustering algorithms directly in SQL to identify thematic communities (a lightweight alternative to Leiden).
- Global Summarization: Automated summary generation for clusters to answer "What are the key trends across these 10,000 documents?".
- Graph Observability: Built-in tracing to visualize the "Reasoning Path"—showing exactly why the agent connected Node A to Node C.
🛡️ Phase 4: Enterprise & Agentic Features
- Agent Identity & Auth (Signal #1): Integration with Postgres Row-Level Security (RLS) to ensure agents only traverse paths the specific user is authorized to see.
- MCP Server Support: Native Model Context Protocol implementation so
pg-graph-ragcan be used as a direct tool in Claude, Cursor, and other IDEs. - Telemetry & Usage Analytics: Per-namespace token tracking and latency monitoring for cost-conscious scaling.
💡 User-Driven Priorities
We prioritize features that reduce Operational Overhead. If you need a feature that further consolidates the "Standard Stack" (Vector + Graph + Relational) into Postgres, open an issue!
Launch Status: 🚀 MVP is live. Focus is now on Hybrid Search and Automatic Entity Resolution.
Development
If you want to contribute or run the tests locally:
# Clone the repo and sync dependencies
uv sync --extra test
# Run tests
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file postgres_graph_rag-0.1.2.tar.gz.
File metadata
- Download URL: postgres_graph_rag-0.1.2.tar.gz
- Upload date:
- Size: 156.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95a778c991e8392577e53ee9bf6c110afc4abb4b52dc0aad1bc577325936912c
|
|
| MD5 |
6cad5764a114c4199158d58abcf3876b
|
|
| BLAKE2b-256 |
4c62951bf241fd9b758e63a093d29e126673422a9d7a4bea67566368ff9214bc
|
Provenance
The following attestation bundles were made for postgres_graph_rag-0.1.2.tar.gz:
Publisher:
publish.yml on h4gen/postgres-graph-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
postgres_graph_rag-0.1.2.tar.gz -
Subject digest:
95a778c991e8392577e53ee9bf6c110afc4abb4b52dc0aad1bc577325936912c - Sigstore transparency entry: 775573272
- Sigstore integration time:
-
Permalink:
h4gen/postgres-graph-rag@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/h4gen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093 -
Trigger Event:
release
-
Statement type:
File details
Details for the file postgres_graph_rag-0.1.2-py3-none-any.whl.
File metadata
- Download URL: postgres_graph_rag-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9a144a0ed3e3e2e38720ab21605832f3968484ab97e3995218176cb5378d1f7
|
|
| MD5 |
b766572f8011203942a886a1141dcdca
|
|
| BLAKE2b-256 |
95cd46ee9ee47f4868dc7665a239dbd93deeb3a4812bd50e5f19411bd3ef950a
|
Provenance
The following attestation bundles were made for postgres_graph_rag-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on h4gen/postgres-graph-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
postgres_graph_rag-0.1.2-py3-none-any.whl -
Subject digest:
a9a144a0ed3e3e2e38720ab21605832f3968484ab97e3995218176cb5378d1f7 - Sigstore transparency entry: 775573275
- Sigstore integration time:
-
Permalink:
h4gen/postgres-graph-rag@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/h4gen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093 -
Trigger Event:
release
-
Statement type: