MCP server for extracting entities from text chunks and creating entity graphs in Neo4j. Supports 100+ LLM providers via LiteLLM.
Project description
MCP Neo4j Entity Graph Server
MCP server for extracting entities from graph nodes and creating entity graphs in Neo4j.
Supports 100+ LLM providers via LiteLLM (OpenAI, Anthropic, Google, Azure, Bedrock, Ollama, etc.)
Features
- Multi-provider LLM support: Use any LLM via LiteLLM (OpenAI, Claude, Gemini, etc.)
- Structured output: Uses JSON schema for reliable entity extraction
- Direct graph creation: Entities created directly in Neo4j (no intermediate files)
- Schema-driven: Define what entities/relationships to extract
- Provenance tracking: EXTRACTED_FROM relationships link entities to source nodes
- High parallelism: Default 20 concurrent extractions (configurable)
- Batched writes: Optimized Neo4j writes (batch every 10 chunks by default)
- Incremental: Only processes nodes without prior extraction (unless force=true)
- Key normalization: Entity keys are normalized (lowercase) for better matching
Supported Models
Models must support structured output (JSON schema). Tested models include:
| Provider | Models |
|---|---|
| OpenAI | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4o-mini |
| Anthropic | claude-sonnet-4-20250514, claude-3-5-sonnet-20241022 |
gemini/gemini-2.5-pro, gemini/gemini-2.5-flash, gemini/gemini-1.5-pro |
|
| Azure OpenAI | azure/gpt-4o, azure/gpt-4o-mini |
| AWS Bedrock | bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 |
Note: If a model doesn't support structured output, you'll get a clear error message with suggestions.
Tools
extract_entities_from_graph
Extracts entities from source nodes and creates entity graph directly in Neo4j.
Parameters:
| Parameter | Default | Description |
|---|---|---|
schema_json |
required | Path to JSON schema file or inline JSON string |
source_label |
"Chunk" | Label of source nodes to extract from |
source_text_property |
"text" | Property containing text to extract from |
force |
false | If true, reprocess all nodes |
parallel |
20 | Concurrent extractions (reduce to 5-10 if hitting rate limits) |
batch_size |
10 | Chunks to batch before writing to Neo4j |
model |
env var | LLM model to use (from EXTRACTION_MODEL env) |
Workflow:
- Queries all nodes with the specified label:
MATCH (n:{source_label}) WHERE NOT (n)<-[:EXTRACTED_FROM]-() - Extracts entities using LLM with structured output (parallel)
- Batches results and writes to Neo4j (optimized transactions)
- Creates EXTRACTED_FROM relationships for provenance
Examples:
# Extract from Chunk nodes with default model
extract_entities_from_graph(schema_json="/path/to/schema.json")
# Use a specific model
extract_entities_from_graph(
schema_json="/path/to/schema.json",
model="claude-sonnet-4-20250514"
)
# Reduce parallelism if hitting rate limits
extract_entities_from_graph(
schema_json="/path/to/schema.json",
parallel=5
)
# Extract from Page nodes
extract_entities_from_graph(
schema_json="/path/to/schema.json",
source_label="Page",
source_text_property="content"
)
# Force re-extraction of all nodes
extract_entities_from_graph(schema_json="/path/to/schema.json", force=True)
convert_schema
Converts data model output from the Data Modeling MCP to extraction schema format.
Parameters:
modeling_output: JSON output from the Data Modeling MCP serveroutput_path: Path to save the extraction schema JSON file
Outputs:
{output_path}- Extraction schema JSON{output_path}.py- Generated Pydantic model with normalization validators
Schema Format
{
"entity_types": [
{
"label": "Medication",
"description": "A pharmaceutical drug or medication",
"key_property": "name",
"properties": [
{"name": "medicationClass", "type": "STRING", "description": "Drug class"}
]
}
],
"relationship_types": [
{
"type": "TREATS",
"description": "Drug treats a condition",
"source_entity": "Medication",
"target_entity": "MedicalCondition"
}
]
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://localhost:7687 | Neo4j connection URI |
NEO4J_USERNAME |
neo4j | Neo4j username |
NEO4J_PASSWORD |
(required) | Neo4j password |
NEO4J_DATABASE |
neo4j | Neo4j database name |
EXTRACTION_MODEL |
gpt-5-mini | Default LLM model for extraction |
OPENAI_API_KEY |
- | Required for OpenAI models |
ANTHROPIC_API_KEY |
- | Required for Anthropic models |
GEMINI_API_KEY |
- | Required for Google Gemini models |
LLM Provider Configuration
LiteLLM supports 100+ providers. Set the appropriate API key for your provider:
OpenAI (default)
export OPENAI_API_KEY="sk-..."
export EXTRACTION_MODEL="gpt-5-mini" # or gpt-4o-mini, gpt-4o
Anthropic Claude
export ANTHROPIC_API_KEY="sk-ant-..."
export EXTRACTION_MODEL="claude-sonnet-4-20250514"
Google Gemini
export GEMINI_API_KEY="..."
export EXTRACTION_MODEL="gemini/gemini-2.5-pro"
Azure OpenAI
export AZURE_API_KEY="..."
export AZURE_API_BASE="https://your-resource.openai.azure.com/"
export AZURE_API_VERSION="2024-02-15-preview"
export EXTRACTION_MODEL="azure/your-deployment-name"
AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION_NAME="us-east-1"
export EXTRACTION_MODEL="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"
Local Models (Ollama)
export EXTRACTION_MODEL="ollama/llama3.1"
# Note: Local models may not support structured output
See LiteLLM docs for all providers.
Usage with Cursor
Add to your ~/.cursor/mcp.json:
{
"mcpServers": {
"neo4j-entity-graph": {
"command": "uv",
"args": ["--directory", "/path/to/mcp-neo4j-entity-graph", "run", "mcp-neo4j-entity-graph"],
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USERNAME": "neo4j",
"NEO4J_PASSWORD": "your-password",
"OPENAI_API_KEY": "your-api-key",
"EXTRACTION_MODEL": "gpt-5-mini"
}
}
}
}
Rate Limits & Performance
Parallelism
The default parallelism is 20 concurrent extractions, optimized for fast processing. However, this may exceed rate limits for some providers.
If you see rate limit errors, reduce the parallel parameter:
# For rate-limited accounts
extract_entities_from_graph(
schema_json="/path/to/schema.json",
parallel=5 # Reduce from default 20
)
Batch Size
Extractions are batched before writing to Neo4j (default: 10 chunks per batch). This reduces Neo4j transactions while maintaining progress visibility.
# Larger batches for better Neo4j performance
extract_entities_from_graph(
schema_json="/path/to/schema.json",
batch_size=20
)
Usage Example
# 1. Convert schema from Data Modeling MCP
convert_schema(
modeling_output='{"nodes": [...], "relationships": [...]}',
output_path="/path/to/schema.json"
)
# Creates: schema.json + schema.py (Pydantic model)
# 2. Extract entities from all Chunk nodes
extract_entities_from_graph(
schema_json="/path/to/schema.json"
)
# Default: parallel=20, batch_size=10, model=gpt-5-mini
# 3. Use a different model
extract_entities_from_graph(
schema_json="/path/to/schema.json",
model="claude-sonnet-4-20250514",
parallel=10 # Claude may have stricter rate limits
)
Graph Schema
After extraction, your Neo4j database will contain:
(:Entity)-[:EXTRACTED_FROM]->(:Chunk)
(:Entity)-[relationship]->(:Entity)
Example query to explore extracted entities:
// Find all entities extracted from a document
MATCH (e)-[:EXTRACTED_FROM]->(c:Chunk)-[:PART_OF]->(d:Document {name: "my-document"})
RETURN labels(e)[0] as type, count(e) as count
ORDER BY count DESC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_neo4j_entity_graph-0.2.0.tar.gz.
File metadata
- Download URL: mcp_neo4j_entity_graph-0.2.0.tar.gz
- Upload date:
- Size: 180.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eada7f5c1dcb3f638459e1119c16ce603bccb10e7ad8c45272ee696b4a42186
|
|
| MD5 |
8000ad776651c93f789de81505144379
|
|
| BLAKE2b-256 |
1b5590ce2e6999ae2acd9f179e27594e2699afb8540a4c123ad9b01028734041
|
File details
Details for the file mcp_neo4j_entity_graph-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mcp_neo4j_entity_graph-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f7462e5cc4074b76263f691ce93657c5c2e67ba1c4e900e0e8a2ed6fe2904c3
|
|
| MD5 |
807c18aedb527b0f720e86d29cb46b90
|
|
| BLAKE2b-256 |
30a58c612f91082a9ac02d8bd24ab521ca752e305d563420e10bec06d5692cff
|