Knowledge graph + MCP tool server for LLM agents — 3rd-gen retrieval, graph-aware multi-turn exploration, structured data tools, DB→ontology one-liner.

These details have not been verified by PyPI

Project links

Project description

Synaptic Memory

Zero API calls at index time. Zero infra. Zero lock-in. A knowledge graph + MCP tool server for LLM agents, with hybrid retrieval, CDC-based live database sync, and Korean FTS built in.

한국어 README

5-minute start

pip install "synaptic-memory[sqlite,korean,vector]"
python examples/quickstart.py

That command ingests examples/data/products.csv into a SQLite-backed graph and runs three searches — all without calling any LLM at indexing time. Full source: examples/quickstart.py.

Two calls to build a graph

import asyncio
from synaptic import SynapticGraph

async def main():
    # Any data → knowledge graph (CSV, JSONL, directory)
    graph = await SynapticGraph.from_data("./my_data/")

    # Or directly from a database — SQLite / PostgreSQL / MySQL / Oracle / MSSQL
    graph = await SynapticGraph.from_database(
        "postgresql://user:pass@host:5432/dbname"
    )

    # Live database? Use CDC mode and only re-read what changed.
    graph = await SynapticGraph.from_database(
        "postgresql://user:pass@host:5432/dbname",
        db="knowledge.db",
        mode="cdc",       # deterministic node IDs + sync state recorded
    )
    result = await graph.sync_from_database(
        "postgresql://user:pass@host:5432/dbname"
    )
    print(result.added, result.updated, result.deleted)

    # Or bring your own chunker (LangChain, Unstructured, custom OCR, ...)
    chunks = my_parser.split("manual.pdf")
    graph = await SynapticGraph.from_chunks(chunks)

    # Search
    result = await graph.search("my question", engine="evidence")

asyncio.run(main())

That's it. Auto-detects file format or DB schema, generates an ontology profile, ingests, indexes, builds FK edges.

Live database sync (CDC) — mode="cdc" enables incremental updates: tables with an updated_at-style column are read with a watermark filter, others fall back to per-row content hashing. Deletes are detected via a TEMP TABLE LEFT JOIN; FK rewires re-link the corresponding RELATED edges. Search results are identical to a full reload (locked in by a regression test). Supports SQLite, PostgreSQL, MySQL/MariaDB.

Office files (PDF/DOCX/PPTX/XLSX/HWP) are supported through the optional xgen-doc2chunk package. Install with pip install synaptic-memory[docs] or use from_chunks() with your own parser.

What it does

Your data (CSV, JSONL, PDF/DOCX/PPTX/XLSX/HWP, SQL database)
  ↓  auto-detect format / auto-discover DB schema + FKs
  ↓  DocumentIngester (text) / TableIngester / DbIngester
  ↓
Knowledge Graph
  ├─ Documents: Category → Document → Chunk
  └─ Structured: table rows as ENTITY nodes + RELATED edges (FKs)
  ↓
36 MCP tools → LLM agent explores via graph-aware multi-turn tool use

Two jobs, nothing else:

Build the graph well — cheap extraction, no LLM at index time
Give the LLM good tools — the agent decides what to search

Install

# Recommended — covers every example in this README
pip install "synaptic-memory[sqlite,korean,vector,mcp]"

# Or everything, including Postgres / Kuzu / Qdrant / MinIO
pip install "synaptic-memory[all]"

Pick-your-own extras

pip install synaptic-memory                # Core (zero deps, in-memory only)
pip install synaptic-memory[sqlite]        # + SQLite FTS5 backend
pip install synaptic-memory[korean]        # + Kiwi morphological analyzer
pip install synaptic-memory[vector]        # + usearch HNSW index
pip install synaptic-memory[mcp]           # + MCP server for Claude
pip install synaptic-memory[embedding]     # + aiohttp for embedding APIs
pip install synaptic-memory[reranker]      # + flashrank cross-encoder
pip install synaptic-memory[postgresql]    # + asyncpg + pgvector
pip install synaptic-memory[docs]          # + xgen-doc2chunk (PDF/DOCX/PPTX/XLSX/HWP)

Quick Start

Option A: Two lines (easiest)

import asyncio
from synaptic import SynapticGraph

async def main():
    # CSV file
    graph = await SynapticGraph.from_data("products.csv")

    # JSONL documents
    graph = await SynapticGraph.from_data("documents.jsonl")

    # Entire directory (scans all CSV/JSONL)
    graph = await SynapticGraph.from_data("./my_corpus/")

    # With embedding (optional, improves semantic search)
    graph = await SynapticGraph.from_data(
        "./my_corpus/",
        embed_url="http://localhost:11434/v1",
    )

    # Search
    result = await graph.search("my question", engine="evidence")
    for activated in result.nodes[:5]:
        print(activated.node.title, activated.activation)

asyncio.run(main())

Option B: MCP server (Claude Desktop / Code)

synaptic-mcp --db my_graph.db
synaptic-mcp --db my_graph.db --embed-url http://localhost:11434/v1

Claude can now call 36 tools to explore your graph — search, ingest new files into the graph mid-conversation, and sync from a live database without dropping to a CLI.

A ready-to-paste claude_desktop_config.json snippet is in examples/mcp_claude_desktop.json.

Option BX: LangChain retriever (drop-in)

pip install "synaptic-memory[sqlite,korean,vector,langchain]"

import asyncio
from synaptic import SynapticGraph
from synaptic.integrations.langchain import SynapticRetriever

async def main():
    graph = await SynapticGraph.from_data("./docs/")
    retriever = SynapticRetriever(graph=graph, k=5, engine="evidence")

    docs = await retriever.ainvoke("my question")
    for doc in docs:
        print(doc.page_content[:80], "   ", doc.metadata["score"])

asyncio.run(main())

Runnable example: examples/langchain_retriever.py. Each hit becomes a LangChain Document with the node id, title, score, and any structured properties in metadata — works unmodified in RetrievalQA chains, agents, and RAG graphs.

Option C: Full control

import asyncio
from synaptic.backends.sqlite_graph import SqliteGraphBackend
from synaptic.extensions.domain_profile import DomainProfile
from synaptic.extensions.document_ingester import DocumentIngester, JsonlDocumentSource

async def main():
    profile = DomainProfile.load("my_profile.toml")
    backend = SqliteGraphBackend("graph.db")
    await backend.connect()

    source = JsonlDocumentSource("docs.jsonl", "chunks.jsonl")
    ingester = DocumentIngester(profile=profile, backend=backend)
    await ingester.ingest(source)

asyncio.run(main())

Indexing cost comparison

Approach	LLM at indexing	Trade-off
GraphRAG-style (MS GraphRAG, Cognee, Graphiti)	LLM extracts entities + relations + community summaries	Highest recall on narrative corpora, but every new document costs LLM tokens
LightRAG-style	LLM deferred to query time	Less index cost, but each query pays
Synaptic	None. Structural + statistical signals only (FK, NEXT_CHUNK, phrase DF hubs, MENTIONS)	Cheapest, deterministic, but won't synthesize new relations on its own

No LLM at indexing. The graph is a search index, not a knowledge base. If you need LLM-synthesized summaries on top of the graph, layer them with your own agent — Synaptic gives you the primitives and leaves the synthesis choice to you.

v0.15.0: pass engine="evidence" to graph.search() to use the hybrid pipeline (BM25 + HNSW + PPR + cross-encoder + MMR). The default flips to "evidence" in v0.16.0 and the legacy engine is removed in v0.17.0. A migration guide will ship with v0.16.0.

Agent Tools (36 total)

Text search tools

Tool	Purpose
`deep_search`	Recommended. Search → expand → read documents in ONE call
`compare_search`	Auto-decompose multi-topic queries, search in parallel
`knowledge_search`	Core semantic search (routes through EvidenceSearch in v0.14.2+)
`agent_search`	FTS + vector hybrid search with intent routing
`expand`	1-hop graph neighbours
`get_document`	Full document with query-relevant chunks
`search_exact`	Literal substring match for IDs/codes
`follow`	Walk a specific edge type

Structured data tools

Tool	Purpose
`filter_nodes`	Property filter (>=, <=, contains) — returns `{total, showing}` for accurate counting
`aggregate_nodes`	GROUP BY + COUNT/SUM/AVG/MAX/MIN with optional WHERE pre-filter
`join_related`	FK-based related record lookup — walks RELATED edges (O(degree))

Ingest / CDC tools (v0.14.0+)

Mid-conversation ingestion so Claude can teach itself new material without leaving the chat.

Tool	Purpose
`knowledge_add_document`	Ingest a long-text document with automatic sentence-boundary chunking
`knowledge_add_table`	Ingest structured rows → ENTITY nodes + FK edges
`knowledge_add_chunks`	BYO-chunker path for pre-split content
`knowledge_ingest_path`	Ingest a CSV / JSONL / text file from the local filesystem
`knowledge_remove`	Delete a single node with edge cascade
`knowledge_sync_from_database`	Incremental sync from a live database (CDC)
`knowledge_backfill`	Repair graphs missing embeddings or phrase hubs (v0.14.4+)

Navigation tools

Tool	Purpose
`list_categories`	Category list with document counts
`count`	Structural count by kind/category/year
`session_info`	Multi-turn session state

All tools return { data, hints, session }. The SearchSession tracks seen nodes across turns so the agent never re-reads the same chunk.

Retrieval Pipeline

Query
  ↓  Kiwi morphological analysis (Korean) or regex (other)
  ↓  BM25 FTS + title 3x boost + substring fallback
  ↓  Vector search (usearch HNSW, optional)
  ↓  Vector PRF (pseudo relevance feedback, 2-pass)
  ↓  PPR graph discovery (personalized pagerank)
  ↓  GraphExpander (1-hop: category siblings, chunk-next, entity mentions)
  ↓  HybridReranker (lexical + semantic + graph + structural + authority + temporal)
  ↓  MaxP document aggregation (coverage bonus)
  ↓  Cross-encoder reranker (bge-reranker-v2-m3 via TEI, optional)
  ↓  EvidenceAggregator (MMR diversity + per-doc cap + category coverage)
Result

Benchmarks

Reproducible FTS-only baseline (< 2 seconds on a laptop)

pip install "synaptic-memory[korean]"
python examples/benchmark_allganize.py

Output (deterministic, v0.16.0):

Dataset                  Corpus  Queries      MRR     R@10        Hit     Time
--------------------------------------------------------------------------------
Allganize RAG-ko            200      200    0.947    1.000   200/200     9.3s
Allganize RAG-Eval          300      300    0.911    0.950   285/300     5.9s

This is the embedder-free baseline (EvidenceSearch pipeline: BM25 + PPR + MMR, no vector index, no cross-encoder). Full source: examples/benchmark_allganize.py. Data source: allganize/RAG-Evaluation-Dataset-KO.

v0.16.0 — engine default flipped to "evidence". Combined with the v0.15.1 query-mode Kiwi improvement, FTS-only Korean retrieval moved from Allganize RAG-ko MRR 0.621 (v0.15.0) → 0.947 (v0.16.0) without any embedder or reranker. English (HotPotQA-24) 0.727 → 0.875. Full ablation: examples/ablation/run_ablation.py. Reproducibility under streaming ingest sharpens in lockstep: top-1 rank invariance rose from 54.5 % → 100 %, bit-wise top-10 identical from 51.5 % → 96 %, with MRR drift exactly zero.

Head-to-head vs Mem0 / Cognee / HippoRAG2

A runnable harness (BEIR-style corpus, same MRR / R@10 scoring code for every system):

# Synaptic only (no API keys, ~2 s)
python examples/benchmark_vs_competitors/run_comparison.py --only synaptic

# Adapters for Mem0, Cognee, HippoRAG2 ship in-tree — add them to
# --only when the respective packages are installed and API keys are set
python examples/benchmark_vs_competitors/run_comparison.py --only synaptic,mem0 --subset 10

See examples/benchmark_vs_competitors/README.md for fairness caveats. Competitor self-reported numbers (Mem0 LoCoMo 91.6, HippoRAG2 MuSiQue F1 51.9, the Zep 84→58 correction incident, etc.) are catalogued with sources in docs/comparison/published_numbers.md.

Embedder-free full-dataset summary (v0.16.0)

Run via python examples/ablation/run_ablation.py:

Dataset	Lang	Queries	MRR	Hit @ 10
Allganize RAG-ko	ko	200	0.947	200/200
Allganize RAG-Eval	ko	300	0.911	285/300
AutoRAG KO	ko	114	0.906	114/114
PublicHealthQA KO	ko	77	0.546	64/77
HotPotQA-24 EN	en	24	0.875	24/24

English multi-hop standard benchmarks (v0.16.0, subset)

Run via:

pip install "synaptic-memory[eval]"   # adds `datasets` for HuggingFace download
python examples/ablation/download_benchmarks.py
python examples/ablation/run_tier1_benchmarks.py --subset 500

Adds HotPotQA-dev (66 k corpus), MuSiQue-Ans-dev (21 k), and 2WikiMultihopQA-dev (57 k) — the three retrieval corpora the HippoRAG / GraphRAG line of research uses for head-to-head. Numbers go in docs/comparison/synaptic_results.md.

Full pipeline (embedder + reranker) — pre-v0.16.0 measurements

The numbers below predate the v0.16.0 engine flip. They were measured with the EvidenceSearch pipeline plus an embedder (Ollama qwen3-embedding:4b) and a cross-encoder reranker (TEI bge-reranker-v2-m3), which is why they match or beat the embedder-free numbers above on private corpora. Reproducing these requires a GPU-backed embedder and reranker — see eval/run_all.py. A v0.16.0 rerun is scheduled after the Home-server rebuild.

Dataset	Type	Nodes	MRR	Hit
KRRA Easy	Korean documents (private)	19,720	0.967	20/20
KRRA Hard	Korean documents (private)	19,720	1.000	15/15
X2BEE Easy	PostgreSQL e-commerce (private)	19,843	1.000	20/20
assort Easy	Fashion CSV (private)	13,909	0.867	13/15
HotPotQA-24	English multi-hop (public subset)	226	0.964	24/24

HotPotQA-24 is a 24-question subset. A full HotPotQA-dev (7,405 q) run is planned for v0.16.1 — we won't claim parity with published numbers until then.

Multi-turn agent (GPT-4o-mini, 5 turns max)

Dataset	Result
KRRA Hard agent	10-13/15 (67-87%)
X2BEE Hard agent	17/19 (89%)
assort Hard agent	12/15 (80%)

Structured data queries (filter / aggregate / FK join / count) work end-to-end through graph-aware tools.

Architecture

SynapticGraph.from_data("./data/")          ← Easy API
  ↓
Auto-detect → DomainProfile → Ingest → Index
  ↓
StorageBackend (Protocol)
  ├── MemoryBackend        (testing)
  ├── SqliteGraphBackend   (recommended, FTS5 + HNSW)
  ├── KuzuBackend          (embedded Cypher)
  ├── PostgreSQLBackend    (pgvector)
  └── CompositeBackend     (mix backends)
  ↓
Retrieval pipeline (BM25 + vector + PRF + PPR + reranker + MMR)
  ↓
Agent tools (36) → MCP server → LLM agent

Backends

Backend	Vector Search	Scale	Use Case
`MemoryBackend`	cosine	~10K	Testing
`SqliteGraphBackend`	usearch HNSW	~100K	Default
`KuzuBackend`	HNSW	~10M	Graph-heavy
`PostgreSQLBackend`	pgvector	~1M	Production
`CompositeBackend`	Qdrant	Unlimited	Scale-out

Optional Extras

Extra	What it adds
`korean`	Kiwi morphological analyzer for Korean FTS
`vector`	usearch HNSW index (100x faster vector search)
`embedding`	aiohttp for embedding API calls
`mcp`	MCP server for Claude Desktop/Code
`sqlite`	aiosqlite backend
`docs`	xgen-doc2chunk for PDF/DOCX/PPTX/XLSX/HWP loading

Documentation

Doc	What it is
docs/GUIDE.md	Friendly intro — what/why/how, zero jargon (Korean)
docs/TUTORIAL.en.md	30-minute hands-on walkthrough (English)
docs/TUTORIAL.md	30-minute hands-on walkthrough (Korean)
docs/CONCEPTS.md	3rd-gen GraphRAG + pipeline internals
docs/ARCHITECTURE.md	Original neural-inspired design
docs/COMPARISON.md	vs GraphRAG / LightRAG / LazyGraphRAG
docs/comparison/synaptic_results.md	Reproducible Synaptic numbers with provenance
docs/comparison/published_numbers.md	Competitor self-reported numbers (with sources)
docs/paper/draft.md	arXiv preprint draft — Streaming Retrieval with Top-K Invariance
docs/paper/theorem.md	Formal theorem + proof sketch
docs/ROADMAP.md	Future plans

Dev

uv sync --extra dev --extra sqlite --extra mcp
uv run pytest tests/ -q                   # 809+ tests
uv run ruff check --fix

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.17.2

Apr 19, 2026

0.17.1

Apr 19, 2026

This version

0.16.0

Apr 17, 2026

0.15.0

Apr 14, 2026

0.14.4

Apr 14, 2026

0.14.3

Apr 14, 2026

0.14.2

Apr 14, 2026

0.14.1

Apr 14, 2026

0.14.0

Apr 14, 2026

0.13.0

Apr 12, 2026

0.12.0

Apr 12, 2026

0.11.0

Apr 11, 2026

0.10.0

Apr 3, 2026

0.9.0

Mar 23, 2026

0.8.0

Mar 22, 2026

0.7.0

Mar 22, 2026

0.6.0

Mar 22, 2026

0.5.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synaptic_memory-0.16.0.tar.gz (740.2 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

synaptic_memory-0.16.0-py3-none-any.whl (296.2 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file synaptic_memory-0.16.0.tar.gz.

File metadata

Download URL: synaptic_memory-0.16.0.tar.gz
Upload date: Apr 17, 2026
Size: 740.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.3

File hashes

Hashes for synaptic_memory-0.16.0.tar.gz
Algorithm	Hash digest
SHA256	`88255e9db628218141cdd57b7d10052fc683e6f58e25e541953344f12c3bc142`
MD5	`e08dee8a6d8c87026513141b0a4faf2a`
BLAKE2b-256	`1a7f58a0420d3616d0a69350ff1e2e1127ddec98bcbd881da03c5e4fce4bc858`

See more details on using hashes here.

File details

Details for the file synaptic_memory-0.16.0-py3-none-any.whl.

File metadata

Download URL: synaptic_memory-0.16.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 296.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.3

File hashes

Hashes for synaptic_memory-0.16.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b061e566b7d2199f208f2bbf661303de5d9a8406722025aa7c0427a21653f7cb`
MD5	`be3ea15528f47755777e6e80047ef10f`
BLAKE2b-256	`a7a6fbcfb85918ae5d1d541afb9b0003fb94b0b1db2e5e8aa8f3207be185bebb`

See more details on using hashes here.

synaptic-memory 0.16.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Synaptic Memory

5-minute start

Two calls to build a graph

What it does

Install

Quick Start

Option A: Two lines (easiest)

Option B: MCP server (Claude Desktop / Code)

Option BX: LangChain retriever (drop-in)

Option C: Full control

Indexing cost comparison

Agent Tools (36 total)

Text search tools

Structured data tools

Ingest / CDC tools (v0.14.0+)

Navigation tools

Retrieval Pipeline

Benchmarks

Reproducible FTS-only baseline (< 2 seconds on a laptop)

Head-to-head vs Mem0 / Cognee / HippoRAG2

Embedder-free full-dataset summary (v0.16.0)

English multi-hop standard benchmarks (v0.16.0, subset)

Full pipeline (embedder + reranker) — pre-v0.16.0 measurements

Multi-turn agent (GPT-4o-mini, 5 turns max)

Architecture

Backends

Optional Extras

Documentation

Dev

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes