MCP server for extracting entities from text chunks and creating entity graphs in Neo4j. Supports 100+ LLM providers via LiteLLM.
Project description
MCP Neo4j Entity Graph Server
MCP server for extracting entities and relationships from graph nodes using LLM structured output, creating entity graphs directly in Neo4j.
Supports 100+ LLM providers via LiteLLM (OpenAI, Anthropic, Google, Azure, Bedrock, Ollama, etc.)
Features
- Dual extraction pipeline: Text-only (LLM) and visual (VLM) extraction auto-routed per chunk
- Grammar-enforced structured output: Pydantic models used as
response_format— the LLM cannot violate the schema - Async background processing: Long extractions run in background with job tracking
- Multi-provider LLM support: Use any LLM via LiteLLM (OpenAI, Claude, Gemini, etc.)
- Schema-driven: Define entity types and relationships to extract
- Provenance tracking:
EXTRACTED_FROMrelationships link entities to source chunks - High parallelism: Configurable concurrency (text: up to 50, VLM: up to 50)
- Batched writes: Optimized Neo4j writes (configurable batch size)
- Incremental: Only processes nodes without prior extraction (unless
force=true) - Multi-pass ready: Architecture supports entity-only, relationship-only, and corrective passes (v2)
Tools
convert_schema
Converts data model output from the Data Modeling MCP to a Pydantic extraction schema.
Parameters:
| Parameter | Required | Description |
|---|---|---|
modeling_output |
Yes | JSON output from the Data Modeling MCP server |
output_path |
Yes | Path to save the Pydantic .py file (e.g. /path/to/schema.py) |
Output:
{output_path}— Strongly-typed Pydantic models used asresponse_formatfor LLM structured extraction
The .py file can be customized before running extraction:
- Add
Literaltypes to constrain categorical fields (phase, status, therapeutic area...) - Add
@field_validatorfor normalization (strip legal suffixes, resolve aliases...)
extract_entities
Extracts entities and relationships from graph nodes using LLM. Returns immediately with a job ID.
The tool auto-detects chunk types and routes accordingly:
- Text chunks (
type="text"): sent to LLM with text only - Image/Table chunks (with
imageBase64): sent to VLM with text + image - Page nodes (
:Pagelabel withimageBase64): sent to VLM with text + page image
Parameters:
| Parameter | Default | Description |
|---|---|---|
schema |
required | Path to the Pydantic .py file generated by convert_schema |
source_label |
"Chunk" |
Label of source nodes (Chunk or Page) |
force |
false |
Re-extract all nodes (ignore existing EXTRACTED_FROM) |
text_parallel |
20 |
Max concurrent text extractions |
vlm_parallel |
5 |
Max concurrent VLM extractions |
batch_size |
10 |
Chunks to batch before writing to Neo4j |
model |
env var | LLM model (defaults to EXTRACTION_MODEL) |
pass_type |
"full" |
full, entities_only, relationships_only, corrective |
pass_number |
1 |
Pass number for multi-pass extraction |
check_extraction_status
Monitor background extraction jobs.
| Parameter | Default | Description |
|---|---|---|
job_id |
None | Specific job to check. If omitted, returns all jobs. |
cancel_extraction
Cancel a running extraction job.
| Parameter | Required | Description |
|---|---|---|
job_id |
Yes | Job ID to cancel |
Quick Start
# 1. Convert schema from Data Modeling MCP
convert_schema(
modeling_output='{"nodes": [...], "relationships": [...]}',
output_path="data_models/my_schema.py"
)
# Creates: my_schema.py (Pydantic models, ready for customization)
# 2. (Optional) Open my_schema.py and add Literal constraints / field_validators
# 3. Extract entities (runs in background)
extract_entities(
schema="data_models/my_schema.py",
)
# Returns: {"job_id": "abc123", "status": "started", ...}
# 4. Check progress
check_extraction_status(job_id="abc123")
# Returns: {"status": "extracting", "chunks_completed": 45, ...}
# 5. Re-run extraction (incremental — only unprocessed nodes)
extract_entities(schema="data_models/my_schema.py")
# 6. Force full re-extraction (e.g. after schema changes)
extract_entities(schema="data_models/my_schema.py", force=True)
Generated Pydantic Models
convert_schema generates a .py file with strongly-typed Pydantic models. The ExtractionOutput class is sent to the LLM as response_format, meaning the LLM output is grammar-constrained — it literally cannot produce values outside the schema.
class DrugEntity(BaseModel):
_node_label: ClassVar[str] = "Drug"
_key_property: ClassVar[str] = "name"
name: str = Field(..., description="Drug name")
dose: Optional[str] = Field(default=None, description="Dosage")
@field_validator("name", mode="before")
@classmethod
def _normalize_name(cls, v):
if isinstance(v, str):
return v.strip()
return v
class TreatsRel(BaseModel):
_relationship_type: ClassVar[str] = "TREATS"
drug_name: str = Field(..., description="Drug name")
disease_name: str = Field(..., description="Disease name")
class ExtractionOutput(BaseModel):
drugs: list[DrugEntity] = Field(default_factory=list)
treats: list[TreatsRel] = Field(default_factory=list)
Customizing the Schema
After running convert_schema, open the .py file and add constraints before extraction:
Literal constraints (normalize categorical fields)
from typing import Literal
class ClinicalProgramEntity(BaseModel):
# Forces the LLM to pick from these exact values — no more "Phase III" vs "Phase 3"
phase: Optional[Literal["Phase 1", "Phase 2", "Phase 3", "Registration", "Approved"]] = Field(
default=None,
description="Clinical phase — map Phase I→Phase 1, Phase II→Phase 2, Phase III→Phase 3"
)
Field validators (normalize entity keys to avoid duplicates)
import re
_LEGAL_SUFFIX_RE = re.compile(r",?\s*(Inc\.?|Ltd\.?|AG|SE|GmbH|Pharmaceuticals?)\s*$", re.IGNORECASE)
class CompanyEntity(BaseModel):
@field_validator("name", mode="before")
@classmethod
def _normalize(cls, v):
if isinstance(v, str):
v = v.strip()
while True:
cleaned = _LEGAL_SUFFIX_RE.sub("", v).strip()
if cleaned == v:
break
v = cleaned
return v
Important: Apply the same normalization to the corresponding relationship field (e.g.
company_nameinDevelopsRel) so that Neo4jMERGEkeys always match.
Environment Variables
| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://localhost:7687 |
Neo4j connection URI |
NEO4J_USERNAME |
neo4j |
Neo4j username |
NEO4J_PASSWORD |
(required) | Neo4j password |
NEO4J_DATABASE |
neo4j |
Neo4j database name |
EXTRACTION_MODEL |
gpt-5-mini |
Default LLM model for extraction |
OPENAI_API_KEY |
- | Required for OpenAI models |
Usage with Cursor
Add to your ~/.cursor/mcp.json:
{
"mcpServers": {
"neo4j-entity-graph": {
"command": "uv",
"args": [
"--directory", "/path/to/mcp-neo4j-entity-graph",
"run", "mcp-neo4j-entity-graph"
],
"env": {
"NEO4J_URI": "neo4j://127.0.0.1:7687",
"NEO4J_USERNAME": "neo4j",
"NEO4J_PASSWORD": "your-password",
"OPENAI_API_KEY": "your-api-key",
"EXTRACTION_MODEL": "gpt-5-mini"
}
}
}
}
Performance
Tested on pharma pipeline PDFs with gpt-5-mini:
| Mode | Concurrency | Time | Entities | Relationships |
|---|---|---|---|---|
| Text-only | 50 | 107s | 1,584 | 1,257 |
| VLM (page images) | 50 | 114s | 1,597 | 1,378 |
Architecture
server.py - MCP tools (convert_schema, extract_entities, check/cancel)
job_manager.py - Async job tracking, progress, cancellation
base_extractor.py - Shared: prompts, parsing, Pydantic model loading
text_extractor.py - Text-only LLM extraction (high parallelism)
vlm_extractor.py - Vision+text VLM extraction (configurable parallelism)
schema_generator.py - Pydantic model code generation from data model
models.py - Internal types (ExtractionSchema, ClassifiedChunk, etc.)
Graph Schema
After extraction, your Neo4j database will contain:
(:Entity)-[:EXTRACTED_FROM]->(:Chunk)
(:Entity)-[relationship]->(:Entity)
Example query:
MATCH (e)-[:EXTRACTED_FROM]->(c:Chunk)-[:PART_OF]->(d:Document {name: "my-doc"})
RETURN labels(e)[0] as type, count(e) as count
ORDER BY count DESC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_neo4j_entity_graph-0.4.0.tar.gz.
File metadata
- Download URL: mcp_neo4j_entity_graph-0.4.0.tar.gz
- Upload date:
- Size: 235.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22839c45b72ef2056703913829b7a086de2cb970d17e0e5fa07986bec6741c60
|
|
| MD5 |
c0dad31790c22cf221222b45046ff268
|
|
| BLAKE2b-256 |
a216f6778ceeb734cafc46853080f8d521ab32e3146235f65498a4444fb6a639
|
File details
Details for the file mcp_neo4j_entity_graph-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mcp_neo4j_entity_graph-0.4.0-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d086d50d9176e657d0610596417504469e64d73958a926d1736bd1ed7900f74
|
|
| MD5 |
3ef9d7c155d24c19a5ae07d86c53f13e
|
|
| BLAKE2b-256 |
8f92f8ccc1ef572abd78dbb9321c2b0ac71984f79ec95818673e758adfba7577
|