Graph-aware RAG system for Notion-backed technical knowledge bases.

Project description

Corbin

Corbin is a graph-aware retrieval system for Notion-backed technical knowledge bases.

It turns structured Notion content into a retrieval-ready mirror with documents, chunks, metadata, aliases, and typed relationships, then exposes that knowledge through an API that can power search, grounded answers, and ChatGPT tool use.

Why Corbin

Most note systems are pleasant to write in but weak at retrieval once the knowledge base grows. Corbin keeps Notion as the authoring layer and builds a retrieval layer that is deterministic, inspectable, and easy to evolve.

The goal is not generic semantic search alone. The goal is to answer questions using:

chunk embeddings
exact identifiers
metadata filters
typed relationships
freshness and verification state
provenance back to the source note

Core idea

Corbin treats each Notion page as a source record that can become:

a document
one or more chunks
one or more graph edges
optional aliases and extracted entities

That makes it possible to combine semantic retrieval with structural expansion. A chunk about a CUDA fix can lead to the host it applies to, the service it affects, and the playbook that verifies it.

Architecture

Corbin is split into a few clear layers:

Notion source layer
Pull canonical databases, page metadata, and recursive block content.
Sync and normalization layer
Flatten Notion blocks into clean text, extract metadata, normalize relation properties, and compute content hashes.
Indexing layer
Chunk documents by structure, enrich chunks with compact headers, generate embeddings, and upsert into PostgreSQL with pgvector.
Retrieval layer
Run hybrid search across semantic similarity, full-text search, metadata filters, and graph expansion.
Orchestration API
Expose search and answer endpoints through FastAPI.
Chat integration layer
Present Corbin as tools through MCP so ChatGPT can call into the knowledge base directly.

Planned stack

Python 3.12
FastAPI
PostgreSQL
pgvector
SQLAlchemy
Alembic
Pydantic
HTTPX
Notion API
uv
Docker Compose
MCP server for ChatGPT integration

Retrieval model

Corbin is designed around hybrid retrieval rather than embedding-only search. A query can be analyzed into intent, entities, and constraints, then resolved through several channels:

semantic chunk search
PostgreSQL full-text search
exact and fuzzy alias matching
metadata filters such as host, project, or status
graph expansion from related nodes and edges

The final answer should prefer verified, host-specific, and current documentation whenever possible.

Example use cases

Find the exact playbook for rebuilding a service on a specific host.
Explain how a component, machine, and script are related.
Retrieve the most relevant troubleshooting note, then expand to nearby docs.
Answer a question in ChatGPT using private internal knowledge instead of generic recall.
Surface stale notes that need verification after infra changes.

Initial project layout

corbin/
├── pyproject.toml
├── README.md
├── .env.example
├── configs/
│   ├── app.yaml
│   ├── notion.yaml
│   ├── retrieval.yaml
│   └── chunking.yaml
├── src/
│   └── corbin/
│       ├── notion/
│       │   ├── client.py
│       │   ├── sync.py
│       │   ├── blocks.py
│       │   └── normalize.py
│       ├── indexing/
│       │   ├── chunker.py
│       │   ├── embed.py
│       │   ├── extract.py
│       │   └── upsert.py
│       ├── graph/
│       │   ├── entities.py
│       │   ├── relations.py
│       │   └── traversal.py
│       ├── retrieval/
│       │   ├── analyze.py
│       │   ├── hybrid.py
│       │   ├── rerank.py
│       │   └── answer.py
│       ├── db/
│       │   ├── models.py
│       │   ├── session.py
│       │   └── migrations/
│       ├── api/
│       │   └── main.py
│       └── app/
│           └── mcp_server.py
└── tests/

First milestones

Phase 1

Sync one or two Notion databases into PostgreSQL.

Phase 2

Chunk content and add embeddings.

Phase 3

Capture relation properties as graph edges.

Phase 4

Expose retrieval through FastAPI.

Phase 5

Connect ChatGPT through MCP tools.

Design principles

Notion stays the authoring layer.
PostgreSQL is the retrieval mirror.
Retrieval must be inspectable and testable.
Chunking should follow structure before token count.
Relations are first-class signals, not just metadata.
Answers should always preserve provenance.

Status

Early scaffold. The first version focuses on reliable sync, clean normalization, and grounded retrieval before adding richer answer synthesis and write-back workflows.

Project details

Release history Release notifications | RSS feed

0.1.1

Apr 27, 2026

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corbin-0.1.0.tar.gz (2.8 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

corbin-0.1.0-py3-none-any.whl (3.5 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file corbin-0.1.0.tar.gz.

File metadata

Download URL: corbin-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 2.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for corbin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a96745495a72a2e478059105a9557eb53410d5247b66b68d6d7442165838432f`
MD5	`7561c733027a024efe1ffa74372b42b5`
BLAKE2b-256	`51b41a555953c075b9fa9dbcc06e7d40d0e7b1222befdedc5b6860bd86f675c1`

See more details on using hashes here.

File details

Details for the file corbin-0.1.0-py3-none-any.whl.

File metadata

Download URL: corbin-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 3.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for corbin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f41e1b65d91805787078a8ba41d1ade9f35e9873fb684fc386b34f25c762c6ba`
MD5	`1707e6d6a04fa8fd58044f55d8a478c6`
BLAKE2b-256	`251662cccb708debdcdca2fd3e854912ac806f6d91b2d06ac2ab972e37371ea1`

See more details on using hashes here.

corbin 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Corbin

Why Corbin

Core idea

Architecture

Planned stack

Retrieval model

Example use cases

Initial project layout

First milestones

Phase 1

Phase 2

Phase 3

Phase 4

Phase 5

Design principles

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes