MCP server for extracting entities from text chunks and creating entity graphs in Neo4j. Supports 100+ LLM providers via LiteLLM.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guerinjeanmarc

These details have not been verified by PyPI

Project description

MCP Neo4j Entity Graph Server

MCP server for extracting entities and relationships from graph nodes using LLM structured output, creating entity graphs directly in Neo4j.

Supports 100+ LLM providers via LiteLLM (OpenAI, Anthropic, Google, Azure, Bedrock, Ollama, etc.)

Features

Dual extraction pipeline: Text-only (LLM) and visual (VLM) extraction auto-routed per chunk
Grammar-enforced structured output: Pydantic models used as response_format — the LLM cannot violate the schema
Async background processing: Long extractions run in background with job tracking
Multi-provider LLM support: Use any LLM via LiteLLM (OpenAI, Claude, Gemini, etc.)
Schema-driven: Define entity types and relationships to extract
Provenance tracking: EXTRACTED_FROM relationships link entities to source chunks
High parallelism: Configurable concurrency (text: up to 50, VLM: up to 50)
Batched writes: Optimized Neo4j writes (configurable batch size)
Incremental: Only processes nodes without prior extraction (unless force=true)
Multi-pass ready: Architecture supports entity-only, relationship-only, and corrective passes (v2)

Tools

`convert_schema`

Converts data model output from the Data Modeling MCP to a Pydantic extraction schema.

Parameters:

Parameter	Required	Description
`modeling_output`	Yes	JSON output from the Data Modeling MCP server
`output_path`	Yes	Path to save the Pydantic `.py` file (e.g. `/path/to/schema.py`)

Output:

{output_path} — Strongly-typed Pydantic models used as response_format for LLM structured extraction

The .py file can be customized before running extraction:

Add Literal types to constrain categorical fields (phase, status, therapeutic area...)
Add @field_validator for normalization (strip legal suffixes, resolve aliases...)

`extract_entities`

Extracts entities and relationships from graph nodes using LLM. Returns immediately with a job ID.

The tool auto-detects chunk types and routes accordingly:

Text chunks (type="text"): sent to LLM with text only
Image/Table chunks (with imageBase64): sent to VLM with text + image
Page nodes (:Page label with imageBase64): sent to VLM with text + page image

Parameters:

Parameter	Default	Description
`schema`	required	Path to the Pydantic `.py` file generated by `convert_schema`
`source_label`	`"Chunk"`	Label of source nodes (`Chunk` or `Page`)
`force`	`false`	Re-extract all nodes (ignore existing `EXTRACTED_FROM`)
`text_parallel`	`20`	Max concurrent text extractions
`vlm_parallel`	`5`	Max concurrent VLM extractions
`batch_size`	`10`	Chunks to batch before writing to Neo4j
`model`	env var	LLM model (defaults to `EXTRACTION_MODEL`)
`pass_type`	`"full"`	`full`, `entities_only`, `relationships_only`, `corrective`
`pass_number`	`1`	Pass number for multi-pass extraction

`check_extraction_status`

Monitor background extraction jobs.

Parameter	Default	Description
`job_id`	None	Specific job to check. If omitted, returns all jobs.

`cancel_extraction`

Cancel a running extraction job.

Parameter	Required	Description
`job_id`	Yes	Job ID to cancel

Quick Start

# 1. Convert schema from Data Modeling MCP
convert_schema(
    modeling_output='{"nodes": [...], "relationships": [...]}',
    output_path="data_models/my_schema.py"
)
# Creates: my_schema.py (Pydantic models, ready for customization)

# 2. (Optional) Open my_schema.py and add Literal constraints / field_validators

# 3. Extract entities (runs in background)
extract_entities(
    schema="data_models/my_schema.py",
)
# Returns: {"job_id": "abc123", "status": "started", ...}

# 4. Check progress
check_extraction_status(job_id="abc123")
# Returns: {"status": "extracting", "chunks_completed": 45, ...}

# 5. Re-run extraction (incremental — only unprocessed nodes)
extract_entities(schema="data_models/my_schema.py")

# 6. Force full re-extraction (e.g. after schema changes)
extract_entities(schema="data_models/my_schema.py", force=True)

Generated Pydantic Models

convert_schema generates a .py file with strongly-typed Pydantic models. The ExtractionOutput class is sent to the LLM as response_format, meaning the LLM output is grammar-constrained — it literally cannot produce values outside the schema.

class DrugEntity(BaseModel):
    _node_label: ClassVar[str] = "Drug"
    _key_property: ClassVar[str] = "name"

    name: str = Field(..., description="Drug name")
    dose: Optional[str] = Field(default=None, description="Dosage")

    @field_validator("name", mode="before")
    @classmethod
    def _normalize_name(cls, v):
        if isinstance(v, str):
            return v.strip()
        return v

class TreatsRel(BaseModel):
    _relationship_type: ClassVar[str] = "TREATS"
    drug_name: str = Field(..., description="Drug name")
    disease_name: str = Field(..., description="Disease name")

class ExtractionOutput(BaseModel):
    drugs: list[DrugEntity] = Field(default_factory=list)
    treats: list[TreatsRel] = Field(default_factory=list)

Customizing the Schema

After running convert_schema, open the .py file and add constraints before extraction:

Literal constraints (normalize categorical fields)

from typing import Literal

class ClinicalProgramEntity(BaseModel):
    # Forces the LLM to pick from these exact values — no more "Phase III" vs "Phase 3"
    phase: Optional[Literal["Phase 1", "Phase 2", "Phase 3", "Registration", "Approved"]] = Field(
        default=None,
        description="Clinical phase — map Phase I→Phase 1, Phase II→Phase 2, Phase III→Phase 3"
    )

Field validators (normalize entity keys to avoid duplicates)

import re

_LEGAL_SUFFIX_RE = re.compile(r",?\s*(Inc\.?|Ltd\.?|AG|SE|GmbH|Pharmaceuticals?)\s*$", re.IGNORECASE)

class CompanyEntity(BaseModel):
    @field_validator("name", mode="before")
    @classmethod
    def _normalize(cls, v):
        if isinstance(v, str):
            v = v.strip()
            while True:
                cleaned = _LEGAL_SUFFIX_RE.sub("", v).strip()
                if cleaned == v:
                    break
                v = cleaned
        return v

Important: Apply the same normalization to the corresponding relationship field (e.g. company_name in DevelopsRel) so that Neo4j MERGE keys always match.

Environment Variables

Variable	Default	Description
`NEO4J_URI`	`bolt://localhost:7687`	Neo4j connection URI
`NEO4J_USERNAME`	`neo4j`	Neo4j username
`NEO4J_PASSWORD`	(required)	Neo4j password
`NEO4J_DATABASE`	`neo4j`	Neo4j database name
`EXTRACTION_MODEL`	`gpt-5-mini`	Default LLM model for extraction
`OPENAI_API_KEY`	-	Required for OpenAI models

Usage with Cursor

Add to your ~/.cursor/mcp.json:

{
  "mcpServers": {
    "neo4j-entity-graph": {
      "command": "uv",
      "args": [
        "--directory", "/path/to/mcp-neo4j-entity-graph",
        "run", "mcp-neo4j-entity-graph"
      ],
      "env": {
        "NEO4J_URI": "neo4j://127.0.0.1:7687",
        "NEO4J_USERNAME": "neo4j",
        "NEO4J_PASSWORD": "your-password",
        "OPENAI_API_KEY": "your-api-key",
        "EXTRACTION_MODEL": "gpt-5-mini"
      }
    }
  }
}

Performance

Tested on pharma pipeline PDFs with gpt-5-mini:

Mode	Concurrency	Time	Entities	Relationships
Text-only	50	107s	1,584	1,257
VLM (page images)	50	114s	1,597	1,378

Architecture

server.py           - MCP tools (convert_schema, extract_entities, check/cancel)
job_manager.py      - Async job tracking, progress, cancellation
base_extractor.py   - Shared: prompts, parsing, Pydantic model loading
text_extractor.py   - Text-only LLM extraction (high parallelism)
vlm_extractor.py    - Vision+text VLM extraction (configurable parallelism)
schema_generator.py - Pydantic model code generation from data model
models.py           - Internal types (ExtractionSchema, ClassifiedChunk, etc.)

Graph Schema

After extraction, your Neo4j database will contain:

(:Entity)-[:EXTRACTED_FROM]->(:Chunk)
(:Entity)-[relationship]->(:Entity)

Example query:

MATCH (e)-[:EXTRACTED_FROM]->(c:Chunk)-[:PART_OF]->(d:Document {name: "my-doc"})
RETURN labels(e)[0] as type, count(e) as count
ORDER BY count DESC

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guerinjeanmarc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Apr 29, 2026

This version

0.3.0

Apr 28, 2026

0.2.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_neo4j_entity_graph-0.3.0.tar.gz (214.2 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_neo4j_entity_graph-0.3.0-py3-none-any.whl (27.9 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file mcp_neo4j_entity_graph-0.3.0.tar.gz.

File metadata

Download URL: mcp_neo4j_entity_graph-0.3.0.tar.gz
Upload date: Apr 28, 2026
Size: 214.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_neo4j_entity_graph-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`e5aa934058520dba0faae31ea1fda766f7f71eb8507c19aab5d919a98b1016f2`
MD5	`3cc1f89ad6b71c16ba09c1d9ba4298c5`
BLAKE2b-256	`4f601eafb60f6ca5e918ebdc293f163ef4f84812873655673830936ba3a46b32`

See more details on using hashes here.

File details

Details for the file mcp_neo4j_entity_graph-0.3.0-py3-none-any.whl.

File metadata

Download URL: mcp_neo4j_entity_graph-0.3.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_neo4j_entity_graph-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e7fc80c21e5b5c7a89de711d4c4b229ec0ec1274457dfc303e4eb58f6fe106c`
MD5	`fd35e52610d07f0a3a0780041dab63b1`
BLAKE2b-256	`f7d2920520584cab8a714ec7795e956c6748cbf9cbc1ca098ff8ec9379acf132`

See more details on using hashes here.

mcp-neo4j-entity-graph 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MCP Neo4j Entity Graph Server

Features

Tools

convert_schema

extract_entities

check_extraction_status

cancel_extraction

Quick Start

Generated Pydantic Models

Customizing the Schema

Literal constraints (normalize categorical fields)

Field validators (normalize entity keys to avoid duplicates)

Environment Variables

Usage with Cursor

Performance

Architecture

Graph Schema

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`convert_schema`

`extract_entities`

`check_extraction_status`

`cancel_extraction`