A lean GraphRAG library using Postgres/pgvector as the sole database

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

🏗️ Early version. Interface can evolve quickly. Star the repo to be updated about new changes, as we work our way through the roadmap.

postgres-graph-rag 🐘🕸️

High-Precision GraphRAG. Native to PostgreSQL.

Most RAG systems are Flatlanders. They use vector similarity to find related text, but they are fundamentally blind to relationships. If you ask your RAG "How is Person A connected to Project B through their shared dependencies?", standard vector search fails because the answer isn't in a single chunk—it’s in the links between them.

Postgres Graph RAG bridges this reasoning gap by turning your existing PostgreSQL database into a structured knowledge engine.

Why this exists:

Infrastructure Nightmare: Building "Smart RAG" usually means adding a Graph DB (Neo4j) to your stack. Now you have a distributed systems nightmare: keeping your Relational DB, Vector DB, and Graph DB in sync.
Flatland Problem: Vector similarity is just probabilistic matching. It doesn't understand hierarchy, causality, or directed relationships (e.g., "A leads B" vs "B leads A").
Batch Bottleneck: Existing GraphRAG research (like Microsoft's) is batch-heavy and token-expensive. It can't handle real-time, incremental updates.

Postgres-Native Solution:

This library is built for Postgres Maximalists. It leverages the engine you already trust to do the heavy lifting:

Recursive Retrieval: Instead of expensive LLM-agent loops, we use SQL Recursive CTEs to perform multi-hop reasoning. It’s deterministic, 10x faster, and handles "neighbor-of-neighbor" walks natively.
Atomic Consistency: Vectors, nodes, and relationships live in one ACID-compliant engine. One transaction. Zero sync lag.
Forever Schema: Using JSONB metadata and a namespaced design, the schema is migration-proof. You can evolve your graph's logic without ever running ALTER TABLE.

Core Philosophy

Infrastructure: Postgres is the only database (via pgvector).
Intelligence: Hosted SLMs (GPT-5.2 or Gemini 2.5) for extraction. Freely configurable.
Simplicity: Native Async Python + SQL.
Scalability: High-performance connection pooling and namespace-aware design (Multi-tenancy).

Installation

pip install postgres-graph-rag

Getting Started (Interactive & Frictionless)

The library is designed to be interactive-friendly. You can instantiate it normally and use await at the top level in Notebooks or REPLs. The database connection pool is initialized lazily upon the first request.

from postgres_graph_rag import PostgresGraphRAG

# 1. Simple Instantiation
rag = PostgresGraphRAG(
    postgres_url="postgresql://user:password@localhost:5432/dbname",
    openai_api_key="sk-..." # Or use google_api_key
)

async def quick_start():
    # 2. Setup (Creates tables and pgvector extension if missing)
    await rag.setup()

    # 3. Add Knowledge (Atomic upserts with automatic entity resolution)
    await rag.add_texts(
        "Johny Srouji leads the hardware team at Apple.", 
        namespace="apple_research"
    )

    # 4. Hybrid Query (Vector Search + Recursive Graph Traversal)
    context = await rag.query(
        "Who is leading the hardware efforts?", 
        namespace="apple_research",
        hops=2
    )
    print(context)
    
    # 5. Cleanup (Closes the connection pool)
    await rag.close()

Advanced Usage & Modes

1. The Production Way: Async Context Manager

For applications (like FastAPI or background workers), use the async with pattern to ensure the connection pool is always closed correctly, even if errors occur.

async with PostgresGraphRAG(postgres_url=DSN, openai_api_key=KEY) as rag:
    await rag.add_texts("The M4 chip uses ARM architecture.")
    # No need to call rag.close(), it happens automatically!

2. Custom Chunking (Inversion of Control)

Don't like the default character splitter? Inject your own. You can pass any callable that takes a string and returns a list of strings.

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create your favorite chunker
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

# Inject it into the library
rag = PostgresGraphRAG(
    postgres_url=DSN,
    openai_api_key=KEY,
    chunker=splitter.split_text # Just pass the method
)

3. Custom Provider Configuration

You can control exactly which models are used for extraction and embeddings.

from postgres_graph_rag.models import ProviderConfig

custom_config: ProviderConfig = {
    "extraction_model": "gpt-5-nano-2025-08-07",
    "embedding_model": "text-embedding-3-large",
    "dimension": 3072 # Must match the model's output
}

rag = PostgresGraphRAG(..., config=custom_config)

4. Multi-Tenancy (Namespacing)

Isolate data for different users or projects within the same database tables.

# User A's private graph
await rag.add_texts("My secret key is 123.", namespace="user_a")

# User B's private graph
await rag.add_texts("My secret key is 999.", namespace="user_b")

# Queries are strictly isolated
res = await rag.query("What is my key?", namespace="user_a") # Returns 123

🗺️ Roadmap & Future Vision

This project follows the "Postgres Maximalism" philosophy: Stop building new infrastructure and start using the full power of the database you already own.

✅ Phase 1: Foundation (Current Release)

Postgres-Native Schema: Migration-proof design using JSONB and namespacing.
Recursive Reasoning: Multi-hop graph traversal implemented via SQL Recursive CTEs.
Incremental Ingestion: Atomic upserts for nodes and edges (no expensive batch rebuilds).
Hosted SLM Extraction: Native support for OpenAI and Google Gemini for high-speed, low-cost tagging.
Async Architecture: Production-ready with high-performance connection pooling.

🏗️ Phase 2: High-Precision Retrieval (Next Up)

Hybrid Search (BM25 + Vector): Integrate Postgres Full-Text Search with pgvector. Keyword precision meets semantic depth.
Advanced Entity Resolution (ER): Automatic merging of similar entities (e.g., "Elon" and "Elon Musk") using pg_trgm fuzzy matching and vector distance during ingestion.
Relationship Scoring: Dynamic edge weighting based on mention frequency and extraction confidence scores.
Metadata Pruning: Filter graph paths based on metadata (e.g., "Only traverse relationships from documents updated in the last 90 days").

📊 Phase 3: Global Intelligence & Scaling

SQL-Native Community Detection: Implement clustering algorithms directly in SQL to identify thematic communities (a lightweight alternative to Leiden).
Global Summarization: Automated summary generation for clusters to answer "What are the key trends across these 10,000 documents?".
Graph Observability: Built-in tracing to visualize the "Reasoning Path"—showing exactly why the agent connected Node A to Node C.

🛡️ Phase 4: Enterprise & Agentic Features

Agent Identity & Auth (Signal #1): Integration with Postgres Row-Level Security (RLS) to ensure agents only traverse paths the specific user is authorized to see.
MCP Server Support: Native Model Context Protocol implementation so pg-graph-rag can be used as a direct tool in Claude, Cursor, and other IDEs.
Telemetry & Usage Analytics: Per-namespace token tracking and latency monitoring for cost-conscious scaling.

💡 User-Driven Priorities

We prioritize features that reduce Operational Overhead. If you need a feature that further consolidates the "Standard Stack" (Vector + Graph + Relational) into Postgres, open an issue!

Launch Status: 🚀 MVP is live. Focus is now on Hybrid Search and Automatic Entity Resolution.

Development

If you want to contribute or run the tests locally:

# Clone the repo and sync dependencies
uv sync --extra test

# Run tests
uv run pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

HagenHoferichter

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Dec 22, 2025

0.1.1

Dec 21, 2025

0.1.0

Dec 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postgres_graph_rag-0.1.2.tar.gz (156.6 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

postgres_graph_rag-0.1.2-py3-none-any.whl (14.1 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file postgres_graph_rag-0.1.2.tar.gz.

File metadata

Download URL: postgres_graph_rag-0.1.2.tar.gz
Upload date: Dec 22, 2025
Size: 156.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for postgres_graph_rag-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`95a778c991e8392577e53ee9bf6c110afc4abb4b52dc0aad1bc577325936912c`
MD5	`6cad5764a114c4199158d58abcf3876b`
BLAKE2b-256	`4c62951bf241fd9b758e63a093d29e126673422a9d7a4bea67566368ff9214bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for postgres_graph_rag-0.1.2.tar.gz:

Publisher: publish.yml on h4gen/postgres-graph-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: postgres_graph_rag-0.1.2.tar.gz
- Subject digest: 95a778c991e8392577e53ee9bf6c110afc4abb4b52dc0aad1bc577325936912c
- Sigstore transparency entry: 775573272
- Sigstore integration time: Dec 22, 2025
Source repository:
- Permalink: h4gen/postgres-graph-rag@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/h4gen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093
- Trigger Event: release

File details

Details for the file postgres_graph_rag-0.1.2-py3-none-any.whl.

File metadata

Download URL: postgres_graph_rag-0.1.2-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for postgres_graph_rag-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9a144a0ed3e3e2e38720ab21605832f3968484ab97e3995218176cb5378d1f7`
MD5	`b766572f8011203942a886a1141dcdca`
BLAKE2b-256	`95cd46ee9ee47f4868dc7665a239dbd93deeb3a4812bd50e5f19411bd3ef950a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for postgres_graph_rag-0.1.2-py3-none-any.whl:

Publisher: publish.yml on h4gen/postgres-graph-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: postgres_graph_rag-0.1.2-py3-none-any.whl
- Subject digest: a9a144a0ed3e3e2e38720ab21605832f3968484ab97e3995218176cb5378d1f7
- Sigstore transparency entry: 775573275
- Sigstore integration time: Dec 22, 2025
Source repository:
- Permalink: h4gen/postgres-graph-rag@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/h4gen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6fdda4ae7cbf96f0605e05def086a6cd1aaa8093
- Trigger Event: release

postgres-graph-rag 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

postgres-graph-rag 🐘🕸️

High-Precision GraphRAG. Native to PostgreSQL.

Why this exists:

Postgres-Native Solution:

Core Philosophy

Installation

Getting Started (Interactive & Frictionless)

Advanced Usage & Modes

1. The Production Way: Async Context Manager

2. Custom Chunking (Inversion of Control)

3. Custom Provider Configuration

4. Multi-Tenancy (Namespacing)

🗺️ Roadmap & Future Vision

✅ Phase 1: Foundation (Current Release)

🏗️ Phase 2: High-Precision Retrieval (Next Up)

📊 Phase 3: Global Intelligence & Scaling

🛡️ Phase 4: Enterprise & Agentic Features

💡 User-Driven Priorities

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance