MCP server for creating rich lexical graphs from PDFs in Neo4j. Supports multiple parse modes (pymupdf, docling, page-image), pluggable chunking strategies, document versioning, and structural verification.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guerinjeanmarc

These details have not been verified by PyPI

Project description

mcp-neo4j-lexical-graph

MCP server for creating rich lexical graphs from PDF documents in Neo4j. Designed for Neo4j sales engineers to quickly build PDF-to-graph and GraphRAG agent chatbot POCs.

Supports four parsing strategies (PyMuPDF, Docling, page-image, VLM block ordering), pluggable chunking, document versioning, VLM-based description generation, and vector/fulltext search with Neo4j 2026.01 native VECTOR type and document-name prefiltering.

Graph Model

graph LR
    Doc[Document] -->|HAS_PAGE| Page
    Doc -->|HAS_ELEMENT| Img[Image]
    Doc -->|HAS_ELEMENT| Tbl[Table]
    Doc -->|HAS_SECTION| Sec[Section]
    Sec -->|HAS_SUBSECTION| Sec
    Chunk -->|PART_OF| Doc
    Chunk -->|NEXT_CHUNK| Chunk
    Chunk -->|HAS_ELEMENT| Img
    Chunk -->|HAS_ELEMENT| Tbl
    Page -->|NEXT_PAGE| Page

Node types depend on the parse mode used. See Parse Modes below.

Parse Modes

Mode	Nodes created	Best for
`pymupdf`	Document, Chunk, Image, Table	General-purpose text + visual extraction
`docling`	Document, Page, Element, Section, (then Chunk via chunking tool)	Complex layouts, section-aware chunking
`page_image`	Document, Page	Slides/presentations for VLM-based extraction
`vlm_blocks`	Document, Page, Element, Section, (then Chunk via chunking tool)	Experimental. Complex layouts without docling dependency (uses VLM API). Prefer `docling` for production use.

Quick Start

cd mcp-neo4j-lexical-graph
uv sync

Cursor MCP Configuration

Add to your .cursor/mcp.json:

{
  "mcpServers": {
    "neo4j-lexical-graph": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-neo4j-lexical-graph",
        "run",
        "mcp-neo4j-lexical-graph"
      ],
      "env": {
        "NEO4J_URI": "bolt://localhost:7687",
        "NEO4J_USERNAME": "neo4j",
        "NEO4J_PASSWORD": "your-password",
        "NEO4J_DATABASE": "neo4j",
        "EMBEDDING_MODEL": "text-embedding-3-small",
        "EXTRACTION_MODEL": "gpt-5-mini",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Tools

Tools must be called in a specific order — which tools to call depends on the parse mode and document type. See the workflow table below.

Workflow Order

#	Tool	pymupdf	docling	page_image	vlm_blocks	Notes
1	`create_lexical_graph`	✓	✓	✓	✓	Always first. Async — returns job_id.
2	`check_processing_status`	✓	✓	✓	✓	Poll until complete after any async op.
3	`cancel_job`	opt	opt	opt	opt	Only if aborting a running job.
4	`chunk_lexical_graph`	✗	✓	✓	✓	Required for docling/vlm_blocks/page_image. Integrated into create for pymupdf.
5	`list_documents`	✓	✓	✓	✓	Confirm ingestion, get document IDs.
6	`verify_lexical_graph`	opt	opt	✗ never	opt	Single-doc spot-check only. Never for page_image (base64 flood).
7	`assign_section_hierarchy`	✗	opt	✗	opt	For structured docs with nested sections. Uses EXTRACTION_MODEL.
8	`generate_chunk_descriptions`	recommended¹	recommended¹	required	recommended¹	VLM descriptions for Image/Table/Page nodes. Required before embed_chunks for page_image.
9	`embed_chunks`	✓	✓	✓	✓	Synchronous. Call with no parameters — auto-detects textDescription.
10	`set_active_version`	opt	opt	opt	opt	Only when re-ingesting a document.
11	`clean_inactive`	opt	opt	opt	opt	After set_active_version, to remove old versions.
12	`delete_document`	opt	opt	opt	opt	Destructive — removes document + all children.

¹ Recommended when extract_images=True or extract_tables=True (pymupdf) or when the document contains images/tables (docling/vlm_blocks). Without descriptions, Image/Table nodes are invisible to semantic search.

Tool Reference

Tool	Description
`create_lexical_graph`	Parse PDF(s) and create the graph (async, returns job_id). `max_parallel=0` auto-detects worker count from RAM/CPU.
`check_processing_status`	Monitor background job progress
`cancel_job`	Cancel a running background job (optional cleanup of partial data)
`chunk_lexical_graph`	Create Chunk nodes from Elements (4 strategies: token_window, structured, by_section, by_page)
`list_documents`	Inventory of documents with version and chunk count info
`verify_lexical_graph`	Structural checks + Markdown reconstruction (single-doc only)
`assign_section_hierarchy`	LLM-based section level assignment + rebuilds HAS_SUBSECTION + updates sectionContext on chunks. Omit `document_id` to run all active documents in parallel.
`generate_chunk_descriptions`	VLM descriptions for Image/Table/Page nodes — stored as textDescription. `document_id` optional: omit to run for all active documents.
`embed_chunks`	Vector embeddings + fulltext index. Auto-detects textDescription for unified Table/Image/text embedding.
`set_active_version`	Activate a specific document/chunk version
`clean_inactive`	Delete inactive document versions and chunk sets
`delete_document`	Remove a document version with cascade (pages, elements, sections, chunks)

Environment Variables

Variable	Required	Default	Description
`NEO4J_URI`	Yes	`bolt://localhost:7687`	Neo4j connection URI
`NEO4J_USERNAME`	Yes	`neo4j`	Neo4j username
`NEO4J_PASSWORD`	Yes	-	Neo4j password
`NEO4J_DATABASE`	No	`neo4j`	Database name
`EMBEDDING_MODEL`	No	`text-embedding-3-small`	Default embedding model (LiteLLM providers)
`EXTRACTION_MODEL`	No	`gpt-5-mini`	LLM/VLM for section hierarchy and description generation
`OPENAI_API_KEY`	Depends	-	Required when using OpenAI models for embedding or extraction. Other providers use their own key (e.g. `ANTHROPIC_API_KEY`, `AZURE_API_KEY`). See LiteLLM docs

Requirements

Neo4j 2026.01+ (native VECTOR type, vector search with filters)
Python 3.10+
API key for your embedding provider (OpenAI, Azure, Cohere, Voyage, Ollama, etc.)
API key for VLM if using vlm_blocks mode, generate_chunk_descriptions, or assign_section_hierarchy

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guerinjeanmarc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_neo4j_lexical_graph-0.2.0.tar.gz (346.7 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_neo4j_lexical_graph-0.2.0-py3-none-any.whl (80.8 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file mcp_neo4j_lexical_graph-0.2.0.tar.gz.

File metadata

Download URL: mcp_neo4j_lexical_graph-0.2.0.tar.gz
Upload date: Apr 28, 2026
Size: 346.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_neo4j_lexical_graph-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`35a9ca53c0ecaf7c1f1f78b7de1af0e8d38f2f643b982ef1cbc791845933b44e`
MD5	`218e4d3eee5dff91adb11aed92ed0523`
BLAKE2b-256	`de415db493803ba964d3100d2b6ee1b848f8efb3f24c36a7b2cd92a9db98d99f`

See more details on using hashes here.

File details

Details for the file mcp_neo4j_lexical_graph-0.2.0-py3-none-any.whl.

File metadata

Download URL: mcp_neo4j_lexical_graph-0.2.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 80.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_neo4j_lexical_graph-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dca96c4cfed1c819253f7c9c41613900ac1e09fff6e793ac04f8517f636287e8`
MD5	`dd5775a4e40511d1e6505f86766f683e`
BLAKE2b-256	`389d3f67eea1f316f855c1dc088bb1400bef5218e5f7280ac8886da64cda1f34`

See more details on using hashes here.

mcp-neo4j-lexical-graph 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mcp-neo4j-lexical-graph

Graph Model

Parse Modes

Quick Start

Cursor MCP Configuration

Tools

Workflow Order

Tool Reference

Environment Variables

Requirements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes