Skip to main content

Transform PDF documents into structured knowledge graphs with citation provenance

Project description

MalimGraph

███╗   ███╗ █████╗ ██╗     ██╗███╗   ███╗ ██████╗ ██████╗  █████╗ ██████╗ ██╗  ██╗
████╗ ████║██╔══██╗██║     ██║████╗ ████║██╔════╝ ██╔══██╗██╔══██╗██╔══██╗██║  ██║
██╔████╔██║███████║██║     ██║██╔████╔██║██║  ███╗██████╔╝███████║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║     ██║██║╚██╔╝██║██║   ██║██╔══██╗██╔══██║██╔═══╝ ██╔══██║
██║ ╚═╝ ██║██║  ██║███████╗██║██║ ╚═╝ ██║╚██████╔╝██║  ██║██║  ██║██║     ██║  ██║
╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝╚═╝     ╚═╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝  ╚═╝

PyPI version License: MIT Python 3.10+ MCP Compatible CI

From documents to knowledge graphs.

Agentic knowledge graph plugin for Claude Code, Claude Desktop, and Codex. Extract entities, build graphs, chunk for RAG, render HTML, and load into Neo4j or pgvector — all orchestrated by Claude using its own intelligence. No ANTHROPIC_API_KEY required.


Install

pip install malimgraph
claude mcp add malimgraph -- malimgraph-plugin

Then just ask Claude naturally:

"Extract a knowledge graph from report.pdf" "Chunk annual_report.pdf for RAG and store in pgvector" "Full pipeline on this document"


How It Works

You: "Extract a knowledge graph from report.pdf"
        │
        ▼
Claude calls  read_pdf("report.pdf")
        │     ← returns page text + rule entities (dates, amounts, emails…)
        │
        ▼
Claude analyzes text   ← uses YOUR Claude subscription, no extra API key
        │     identifies: Organizations, People, Regulations, Events…
        │     maps: relationships with verbatim source_text evidence
        │
        ▼
Claude calls  save_knowledge_graph(entities, relationships, output_format="all")
        │     ← builds KnowledgeGraph, saves files
        ▼
  ./output/
    ├── knowledge_graph.json    ← full graph with provenance
    ├── knowledge_graph.cypher  ← Neo4j import
    └── knowledge_graph.sql     ← Apache AGE import

Skill Triggers

Say these phrases to activate built-in workflows:

Phrase Workflow Tools
"knowledge graph" / "extract entities" $pdf-to-graph read_pdfsave_knowledge_graph
"chunk for RAG" / "vector search" / "pgvector" $pdf-to-rag chunk_documentembed_and_store_chunks
"full pipeline" / "extract and embed" Full pipeline All tools in sequence
"load into Neo4j" / "Cypher query" $graph-query manage_graph_db
"render HTML" / "browsable document" $document-html render_document_html

Available Tools

Tool Description
read_pdf Parse PDF → page text + rule entities. First step of any KG workflow.
save_knowledge_graph Accept Claude-extracted entities/relationships → save .json/.cypher/.sql
chunk_document Token-aware overlapping chunks with heading context for RAG
render_document_html Structured HTML with page anchors, entity annotations, TOC + search
manage_graph_db Load, query, and manage graphs in Neo4j or Apache AGE
embed_and_store_chunks Embed chunks into PostgreSQL pgvector (OpenAI / Voyage / local)
list_workflows List all available workflows, triggers, and tool sequences

Runtimes

Runtime Install
Claude Code claude mcp add malimgraph -- malimgraph-plugin
Claude Desktop See config below
Codex / OpenAI Agents See AGENTS.md for function schemas
Any MCP runtime {"command": "malimgraph-plugin"}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "malimgraph": {
      "command": "malimgraph-plugin"
    }
  }
}

CLI (standalone, requires ANTHROPIC_API_KEY)

export ANTHROPIC_API_KEY=sk-ant-...

# Full pipeline
malimgraph extract --input report.pdf --output ./output/ --format all
malimgraph chunk --input report.pdf --output ./chunks/
malimgraph render --input report.pdf --output document.html

# pgvector
export PGVECTOR_URI="postgresql://user:pass@localhost:5432/mydb"
export OPENAI_API_KEY=sk-...
malimgraph vector load --input ./chunks/chunks.json

# Graph database
malimgraph db load --input ./output/knowledge_graph.json \
  --target neo4j --uri bolt://localhost:7687 --user neo4j --password secret
malimgraph db query --target neo4j --uri bolt://localhost:7687 \
  --query "MATCH (n:Organization) RETURN n.label, n.source_pages LIMIT 10"

Installation Options

pip install malimgraph                    # core
pip install "malimgraph[neo4j]"           # + Neo4j driver
pip install "malimgraph[pgvector,openai]" # + pgvector + OpenAI embeddings
pip install "malimgraph[pgvector,voyage]" # + pgvector + Voyage AI
pip install "malimgraph[pgvector,local]"  # + pgvector + local CPU embeddings
pip install "malimgraph[all]"             # everything

Output Schema

Every entity and relationship carries full citation provenance:

Field Description
id Stable hash: e_ + MD5(type:label)[:8]
label Canonical entity name
type Organization / Person / Location / Regulation / …
source_text Verbatim quote from the document
source_pages PDF page numbers
confidence high / medium / low
extraction_method rule / llm / hybrid
citations[] All supporting quotes with page refs

pgvector Embedding Providers

Provider Default model Dimension Requires
openai text-embedding-3-small 1536-d OPENAI_API_KEY
voyage voyage-3-large 1024-d VOYAGE_API_KEY
local all-MiniLM-L6-v2 384-d none (CPU)

Database Setup

# Neo4j
docker run -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/yourpassword neo4j:latest

# Apache AGE
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret apache/age:latest

# pgvector
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret pgvector/pgvector:pg17

See docs/database-setup.md for full guides.


Contributing

git clone https://github.com/malim-ai-labs/malim-graph-plugin
pip install -e ".[dev]"
make test
make lint

Credits

Built by Malim AI Labs — AI-powered knowledge infrastructure for Southeast Asia.

Malim AI Labs Social Enterprise (003827047-U) · Kuala Lumpur, Malaysia


License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malimgraph-0.1.6.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malimgraph-0.1.6-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file malimgraph-0.1.6.tar.gz.

File metadata

  • Download URL: malimgraph-0.1.6.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.6.tar.gz
Algorithm Hash digest
SHA256 e267aa71e1ca3f5c2e8eab0c987580baebca20fb3f56c333dd64c5cee66caa72
MD5 875906ff0e0fd0391f0ade0fd229152b
BLAKE2b-256 50a9092b996e862417b80234d568424dc7462337942779cbea2b5d91a05f59a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.6.tar.gz:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file malimgraph-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: malimgraph-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 fdc3aaf504564464af6cac9240da0eea30b1e327b2c4c3464dbe6361cfba610e
MD5 9ad574ada312d0b79238cadb8f6c2395
BLAKE2b-256 9a06d9fedec9f715e9fe34493d2ad504db07fec3d2b31beb9fa92b870fa579cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.6-py3-none-any.whl:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page