Transform PDF documents into structured knowledge graphs with citation provenance
Project description
MalimGraph
███╗ ███╗ █████╗ ██╗ ██╗███╗ ███╗ ██████╗ ██████╗ █████╗ ██████╗ ██╗ ██╗
████╗ ████║██╔══██╗██║ ██║████╗ ████║██╔════╝ ██╔══██╗██╔══██╗██╔══██╗██║ ██║
██╔████╔██║███████║██║ ██║██╔████╔██║██║ ███╗██████╔╝███████║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║ ██║██║╚██╔╝██║██║ ██║██╔══██╗██╔══██║██╔═══╝ ██╔══██║
██║ ╚═╝ ██║██║ ██║███████╗██║██║ ╚═╝ ██║╚██████╔╝██║ ██║██║ ██║██║ ██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝
From documents to knowledge graphs.
Agentic knowledge graph plugin for Claude Code, Claude Desktop, and Codex.
Extract entities, build graphs, chunk for RAG, render HTML, and load into Neo4j or pgvector —
all orchestrated by Claude using its own intelligence. No ANTHROPIC_API_KEY required.
Install
pip install malimgraph
claude mcp add malimgraph -- malimgraph-plugin
Then just ask Claude naturally:
"Extract a knowledge graph from report.pdf" "Chunk annual_report.pdf for RAG and store in pgvector" "Full pipeline on this document"
How It Works
You: "Extract a knowledge graph from report.pdf"
│
▼
Claude calls read_pdf("report.pdf")
│ ← returns page text + rule entities (dates, amounts, emails…)
│
▼
Claude analyzes text ← uses YOUR Claude subscription, no extra API key
│ identifies: Organizations, People, Regulations, Events…
│ maps: relationships with verbatim source_text evidence
│
▼
Claude calls save_knowledge_graph(entities, relationships, output_format="all")
│ ← builds KnowledgeGraph, saves files
▼
./output/
├── knowledge_graph.json ← full graph with provenance
├── knowledge_graph.cypher ← Neo4j import
└── knowledge_graph.sql ← Apache AGE import
Skill Triggers
Say these phrases to activate built-in workflows:
| Phrase | Workflow | Tools |
|---|---|---|
| "knowledge graph" / "extract entities" | $pdf-to-graph |
read_pdf → save_knowledge_graph |
| "chunk for RAG" / "vector search" / "pgvector" | $pdf-to-rag |
chunk_document → embed_and_store_chunks |
| "full pipeline" / "extract and embed" | Full pipeline | All tools in sequence |
| "load into Neo4j" / "Cypher query" | $graph-query |
manage_graph_db |
| "render HTML" / "browsable document" | $document-html |
render_document_html |
Available Tools
| Tool | Description |
|---|---|
read_pdf |
Parse PDF → page text + rule entities. First step of any KG workflow. |
save_knowledge_graph |
Accept Claude-extracted entities/relationships → save .json/.cypher/.sql |
chunk_document |
Token-aware overlapping chunks with heading context for RAG |
render_document_html |
Structured HTML with page anchors, entity annotations, TOC + search |
manage_graph_db |
Load, query, and manage graphs in Neo4j or Apache AGE |
embed_and_store_chunks |
Embed chunks into PostgreSQL pgvector (OpenAI / Voyage / local) |
list_workflows |
List all available workflows, triggers, and tool sequences |
Runtimes
| Runtime | Install |
|---|---|
| Claude Code | claude mcp add malimgraph -- malimgraph-plugin |
| Claude Desktop | See config below |
| Codex / OpenAI Agents | See AGENTS.md for function schemas |
| Any MCP runtime | {"command": "malimgraph-plugin"} |
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"malimgraph": {
"command": "malimgraph-plugin"
}
}
}
CLI (standalone, requires ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
# Full pipeline
malimgraph extract --input report.pdf --output ./output/ --format all
malimgraph chunk --input report.pdf --output ./chunks/
malimgraph render --input report.pdf --output document.html
# pgvector
export PGVECTOR_URI="postgresql://user:pass@localhost:5432/mydb"
export OPENAI_API_KEY=sk-...
malimgraph vector load --input ./chunks/chunks.json
# Graph database
malimgraph db load --input ./output/knowledge_graph.json \
--target neo4j --uri bolt://localhost:7687 --user neo4j --password secret
malimgraph db query --target neo4j --uri bolt://localhost:7687 \
--query "MATCH (n:Organization) RETURN n.label, n.source_pages LIMIT 10"
Installation Options
pip install malimgraph # core
pip install "malimgraph[neo4j]" # + Neo4j driver
pip install "malimgraph[pgvector,openai]" # + pgvector + OpenAI embeddings
pip install "malimgraph[pgvector,voyage]" # + pgvector + Voyage AI
pip install "malimgraph[pgvector,local]" # + pgvector + local CPU embeddings
pip install "malimgraph[all]" # everything
Output Schema
Every entity and relationship carries full citation provenance:
| Field | Description |
|---|---|
id |
Stable hash: e_ + MD5(type:label)[:8] |
label |
Canonical entity name |
type |
Organization / Person / Location / Regulation / … |
source_text |
Verbatim quote from the document |
source_pages |
PDF page numbers |
confidence |
high / medium / low |
extraction_method |
rule / llm / hybrid |
citations[] |
All supporting quotes with page refs |
pgvector Embedding Providers
| Provider | Default model | Dimension | Requires |
|---|---|---|---|
openai |
text-embedding-3-small |
1536-d | OPENAI_API_KEY |
voyage |
voyage-3-large |
1024-d | VOYAGE_API_KEY |
local |
all-MiniLM-L6-v2 |
384-d | none (CPU) |
Database Setup
# Neo4j
docker run -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/yourpassword neo4j:latest
# Apache AGE
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret apache/age:latest
# pgvector
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret pgvector/pgvector:pg17
See docs/database-setup.md for full guides.
Contributing
git clone https://github.com/malim-ai-labs/malim-graph-plugin
pip install -e ".[dev]"
make test
make lint
Credits
Built by Malim AI Labs — AI-powered knowledge infrastructure for Southeast Asia.
Malim AI Labs Social Enterprise (003827047-U) · Kuala Lumpur, Malaysia
License
MIT — see LICENSE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file malimgraph-0.1.4.tar.gz.
File metadata
- Download URL: malimgraph-0.1.4.tar.gz
- Upload date:
- Size: 78.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffc5fb11d4e745e27e6a7b8080794ff106b199c50e23c3db7225ce4a55892d2f
|
|
| MD5 |
b4367ce167914cbf46e01bdb77aa1f53
|
|
| BLAKE2b-256 |
4fdc0c90f0d2e7b99b31843d878859383709f2797ca894be1f65cdacedbb3a3b
|
Provenance
The following attestation bundles were made for malimgraph-0.1.4.tar.gz:
Publisher:
publish.yml on malim-ai-labs/malim-graph-plugin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
malimgraph-0.1.4.tar.gz -
Subject digest:
ffc5fb11d4e745e27e6a7b8080794ff106b199c50e23c3db7225ce4a55892d2f - Sigstore transparency entry: 1458963330
- Sigstore integration time:
-
Permalink:
malim-ai-labs/malim-graph-plugin@af0e5898f6344e6b7995796d421892a9b84aa9e0 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/malim-ai-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@af0e5898f6344e6b7995796d421892a9b84aa9e0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file malimgraph-0.1.4-py3-none-any.whl.
File metadata
- Download URL: malimgraph-0.1.4-py3-none-any.whl
- Upload date:
- Size: 48.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e82281068d0fb465736d55ee7adcea62fc33bd22c5749eeca3aa0010302da6b4
|
|
| MD5 |
ff5bbed0d0fcf7fa667f96b428ac93df
|
|
| BLAKE2b-256 |
42cbd3c2ff7219f9fe5318235fad9d55428f57d4d8b87cb2ed005f5a219a2b69
|
Provenance
The following attestation bundles were made for malimgraph-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on malim-ai-labs/malim-graph-plugin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
malimgraph-0.1.4-py3-none-any.whl -
Subject digest:
e82281068d0fb465736d55ee7adcea62fc33bd22c5749eeca3aa0010302da6b4 - Sigstore transparency entry: 1458963454
- Sigstore integration time:
-
Permalink:
malim-ai-labs/malim-graph-plugin@af0e5898f6344e6b7995796d421892a9b84aa9e0 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/malim-ai-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@af0e5898f6344e6b7995796d421892a9b84aa9e0 -
Trigger Event:
push
-
Statement type: