Skip to main content

Transform PDF documents into structured knowledge graphs with citation provenance

Project description

MalimGraph

███╗   ███╗ █████╗ ██╗     ██╗███╗   ███╗ ██████╗ ██████╗  █████╗ ██████╗ ██╗  ██╗
████╗ ████║██╔══██╗██║     ██║████╗ ████║██╔════╝ ██╔══██╗██╔══██╗██╔══██╗██║  ██║
██╔████╔██║███████║██║     ██║██╔████╔██║██║  ███╗██████╔╝███████║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║     ██║██║╚██╔╝██║██║   ██║██╔══██╗██╔══██║██╔═══╝ ██╔══██║
██║ ╚═╝ ██║██║  ██║███████╗██║██║ ╚═╝ ██║╚██████╔╝██║  ██║██║  ██║██║     ██║  ██║
╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝╚═╝     ╚═╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝  ╚═╝

PyPI version License: MIT Python 3.10+ MCP Compatible CI

From documents to knowledge graphs.

Agentic knowledge graph plugin for Claude Code, Claude Desktop, and Codex. Extract entities, build graphs, chunk for RAG, render HTML, and load into Neo4j or pgvector — all orchestrated by Claude using its own intelligence. No ANTHROPIC_API_KEY required.


Install

pip install malimgraph
claude mcp add malimgraph -- malimgraph-plugin

Then just ask Claude naturally:

"Extract a knowledge graph from report.pdf" "Chunk annual_report.pdf for RAG and store in pgvector" "Full pipeline on this document"


How It Works

You: "Extract a knowledge graph from report.pdf"
        │
        ▼
Claude calls  read_pdf("report.pdf")
        │     ← returns page text + rule entities (dates, amounts, emails…)
        │
        ▼
Claude analyzes text   ← uses YOUR Claude subscription, no extra API key
        │     identifies: Organizations, People, Regulations, Events…
        │     maps: relationships with verbatim source_text evidence
        │
        ▼
Claude calls  save_knowledge_graph(entities, relationships, output_format="all")
        │     ← builds KnowledgeGraph, saves files
        ▼
  ./output/
    ├── knowledge_graph.json    ← full graph with provenance
    ├── knowledge_graph.cypher  ← Neo4j import
    └── knowledge_graph.sql     ← Apache AGE import

Skill Triggers

Say these phrases to activate built-in workflows:

Phrase Workflow Tools
"knowledge graph" / "extract entities" $pdf-to-graph read_pdfsave_knowledge_graph
"chunk for RAG" / "vector search" / "pgvector" $pdf-to-rag chunk_documentembed_and_store_chunks
"full pipeline" / "extract and embed" Full pipeline All tools in sequence
"load into Neo4j" / "Cypher query" $graph-query manage_graph_db
"render HTML" / "browsable document" $document-html render_document_html

Available Tools

Tool Description
read_pdf Parse PDF → page text + rule entities. First step of any KG workflow.
save_knowledge_graph Accept Claude-extracted entities/relationships → save .json/.cypher/.sql
chunk_document Token-aware overlapping chunks with heading context for RAG
render_document_html Structured HTML with page anchors, entity annotations, TOC + search
manage_graph_db Load, query, and manage graphs in Neo4j or Apache AGE
embed_and_store_chunks Embed chunks into PostgreSQL pgvector (OpenAI / Voyage / local)
list_workflows List all available workflows, triggers, and tool sequences

Runtimes

Runtime Install
Claude Code claude mcp add malimgraph -- malimgraph-plugin
Claude Desktop See config below
Codex / OpenAI Agents See AGENTS.md for function schemas
Any MCP runtime {"command": "malimgraph-plugin"}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "malimgraph": {
      "command": "malimgraph-plugin"
    }
  }
}

CLI (standalone, requires ANTHROPIC_API_KEY)

export ANTHROPIC_API_KEY=sk-ant-...

# Full pipeline
malimgraph extract --input report.pdf --output ./output/ --format all
malimgraph chunk --input report.pdf --output ./chunks/
malimgraph render --input report.pdf --output document.html

# pgvector
export PGVECTOR_URI="postgresql://user:pass@localhost:5432/mydb"
export OPENAI_API_KEY=sk-...
malimgraph vector load --input ./chunks/chunks.json

# Graph database
malimgraph db load --input ./output/knowledge_graph.json \
  --target neo4j --uri bolt://localhost:7687 --user neo4j --password secret
malimgraph db query --target neo4j --uri bolt://localhost:7687 \
  --query "MATCH (n:Organization) RETURN n.label, n.source_pages LIMIT 10"

Installation Options

pip install malimgraph                    # core
pip install "malimgraph[neo4j]"           # + Neo4j driver
pip install "malimgraph[pgvector,openai]" # + pgvector + OpenAI embeddings
pip install "malimgraph[pgvector,voyage]" # + pgvector + Voyage AI
pip install "malimgraph[pgvector,local]"  # + pgvector + local CPU embeddings
pip install "malimgraph[all]"             # everything

Output Schema

Every entity and relationship carries full citation provenance:

Field Description
id Stable hash: e_ + MD5(type:label)[:8]
label Canonical entity name
type Organization / Person / Location / Regulation / …
source_text Verbatim quote from the document
source_pages PDF page numbers
confidence high / medium / low
extraction_method rule / llm / hybrid
citations[] All supporting quotes with page refs

pgvector Embedding Providers

Provider Default model Dimension Requires
openai text-embedding-3-small 1536-d OPENAI_API_KEY
voyage voyage-3-large 1024-d VOYAGE_API_KEY
local all-MiniLM-L6-v2 384-d none (CPU)

Database Setup

# Neo4j
docker run -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/yourpassword neo4j:latest

# Apache AGE
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret apache/age:latest

# pgvector
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret pgvector/pgvector:pg17

See docs/database-setup.md for full guides.


Contributing

git clone https://github.com/malim-ai-labs/malim-graph-plugin
pip install -e ".[dev]"
make test
make lint

Credits

Built by Malim AI Labs — AI-powered knowledge infrastructure for Southeast Asia.

Malim AI Labs Social Enterprise (003827047-U) · Kuala Lumpur, Malaysia


License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malimgraph-0.1.5.tar.gz (79.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malimgraph-0.1.5-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file malimgraph-0.1.5.tar.gz.

File metadata

  • Download URL: malimgraph-0.1.5.tar.gz
  • Upload date:
  • Size: 79.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c8fce25667b2432625340c08a4e7905e8321aa7f873614458cb4a71a9a80dcc7
MD5 af4069354aeeee934479e002bf7502b6
BLAKE2b-256 18186b1f80b5287f32e194c32970f5caad2516feba55191ba810899735bacf6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.5.tar.gz:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file malimgraph-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: malimgraph-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bb7a7a3197d40860a5de81485bbe4acb29672c2fead0e858fcfec3bbd703ae5d
MD5 bf6d1dfe2026897efb1198527a406e1d
BLAKE2b-256 44c38ea02e481892b11333bde336fed5f54f154a2cbbe0fc1fdd5165438d0520

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.5-py3-none-any.whl:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page