Skip to main content

Transform PDF documents into structured knowledge graphs with citation provenance

Project description

MalimGraph

███╗   ███╗ █████╗ ██╗     ██╗███╗   ███╗ ██████╗ ██████╗  █████╗ ██████╗ ██╗  ██╗
████╗ ████║██╔══██╗██║     ██║████╗ ████║██╔════╝ ██╔══██╗██╔══██╗██╔══██╗██║  ██║
██╔████╔██║███████║██║     ██║██╔████╔██║██║  ███╗██████╔╝███████║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║     ██║██║╚██╔╝██║██║   ██║██╔══██╗██╔══██║██╔═══╝ ██╔══██║
██║ ╚═╝ ██║██║  ██║███████╗██║██║ ╚═╝ ██║╚██████╔╝██║  ██║██║  ██║██║     ██║  ██║
╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝╚═╝     ╚═╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝  ╚═╝

PyPI version License: MIT Python 3.10+ MCP Compatible CI

From documents to knowledge graphs.

Agentic knowledge graph plugin for Claude Code, Claude Desktop, and Codex. Extract entities, build graphs, chunk for RAG, render HTML, and load into Neo4j or pgvector — all orchestrated by Claude using its own intelligence. No ANTHROPIC_API_KEY required.


Install

pip install malimgraph
claude mcp add malimgraph -- malimgraph-plugin

Then just ask Claude naturally:

"Extract a knowledge graph from report.pdf" "Chunk annual_report.pdf for RAG and store in pgvector" "Full pipeline on this document"


How It Works

You: "Extract a knowledge graph from report.pdf"
        │
        ▼
Claude calls  read_pdf("report.pdf")
        │     ← returns page text + rule entities (dates, amounts, emails…)
        │
        ▼
Claude analyzes text   ← uses YOUR Claude subscription, no extra API key
        │     identifies: Organizations, People, Regulations, Events…
        │     maps: relationships with verbatim source_text evidence
        │
        ▼
Claude calls  save_knowledge_graph(entities, relationships, output_format="all")
        │     ← builds KnowledgeGraph, saves files
        ▼
  ./output/
    ├── knowledge_graph.json    ← full graph with provenance
    ├── knowledge_graph.cypher  ← Neo4j import
    └── knowledge_graph.sql     ← Apache AGE import

Skill Triggers

Say these phrases to activate built-in workflows:

Phrase Workflow Tools
"knowledge graph" / "extract entities" $pdf-to-graph read_pdfsave_knowledge_graph
"chunk for RAG" / "vector search" / "pgvector" $pdf-to-rag chunk_documentembed_and_store_chunks
"full pipeline" / "extract and embed" Full pipeline All tools in sequence
"load into Neo4j" / "Cypher query" $graph-query manage_graph_db
"render HTML" / "browsable document" $document-html render_document_html

Available Tools

Tool Description
read_pdf Parse PDF → page text + rule entities. First step of any KG workflow.
save_knowledge_graph Accept Claude-extracted entities/relationships → save .json/.cypher/.sql
chunk_document Token-aware overlapping chunks with heading context for RAG
render_document_html Structured HTML with page anchors, entity annotations, TOC + search
manage_graph_db Load, query, and manage graphs in Neo4j or Apache AGE
embed_and_store_chunks Embed chunks into PostgreSQL pgvector (OpenAI / Voyage / local)
list_workflows List all available workflows, triggers, and tool sequences

Runtimes

Runtime Install
Claude Code claude mcp add malimgraph -- malimgraph-plugin
Claude Desktop See config below
Codex / OpenAI Agents See AGENTS.md for function schemas
Any MCP runtime {"command": "malimgraph-plugin"}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "malimgraph": {
      "command": "malimgraph-plugin"
    }
  }
}

CLI (standalone, requires ANTHROPIC_API_KEY)

export ANTHROPIC_API_KEY=sk-ant-...

# Full pipeline
malimgraph extract --input report.pdf --output ./output/ --format all
malimgraph chunk --input report.pdf --output ./chunks/
malimgraph render --input report.pdf --output document.html

# pgvector
export PGVECTOR_URI="postgresql://user:pass@localhost:5432/mydb"
export OPENAI_API_KEY=sk-...
malimgraph vector load --input ./chunks/chunks.json

# Graph database
malimgraph db load --input ./output/knowledge_graph.json \
  --target neo4j --uri bolt://localhost:7687 --user neo4j --password secret
malimgraph db query --target neo4j --uri bolt://localhost:7687 \
  --query "MATCH (n:Organization) RETURN n.label, n.source_pages LIMIT 10"

Installation Options

pip install malimgraph                    # core
pip install "malimgraph[neo4j]"           # + Neo4j driver
pip install "malimgraph[pgvector,openai]" # + pgvector + OpenAI embeddings
pip install "malimgraph[pgvector,voyage]" # + pgvector + Voyage AI
pip install "malimgraph[pgvector,local]"  # + pgvector + local CPU embeddings
pip install "malimgraph[all]"             # everything

Output Schema

Every entity and relationship carries full citation provenance:

Field Description
id Stable hash: e_ + MD5(type:label)[:8]
label Canonical entity name
type Organization / Person / Location / Regulation / …
source_text Verbatim quote from the document
source_pages PDF page numbers
confidence high / medium / low
extraction_method rule / llm / hybrid
citations[] All supporting quotes with page refs

pgvector Embedding Providers

Provider Default model Dimension Requires
openai text-embedding-3-small 1536-d OPENAI_API_KEY
voyage voyage-3-large 1024-d VOYAGE_API_KEY
local all-MiniLM-L6-v2 384-d none (CPU)

Database Setup

# Neo4j
docker run -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/yourpassword neo4j:latest

# Apache AGE
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret apache/age:latest

# pgvector
docker run -p 5432:5432 -e POSTGRES_PASSWORD=secret pgvector/pgvector:pg17

See docs/database-setup.md for full guides.


Contributing

git clone https://github.com/malim-ai-labs/malim-graph-plugin
pip install -e ".[dev]"
make test
make lint

Credits

Built by Malim AI Labs — AI-powered knowledge infrastructure for Southeast Asia.

Malim AI Labs Social Enterprise (003827047-U) · Kuala Lumpur, Malaysia


License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malimgraph-0.1.4.tar.gz (78.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malimgraph-0.1.4-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file malimgraph-0.1.4.tar.gz.

File metadata

  • Download URL: malimgraph-0.1.4.tar.gz
  • Upload date:
  • Size: 78.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ffc5fb11d4e745e27e6a7b8080794ff106b199c50e23c3db7225ce4a55892d2f
MD5 b4367ce167914cbf46e01bdb77aa1f53
BLAKE2b-256 4fdc0c90f0d2e7b99b31843d878859383709f2797ca894be1f65cdacedbb3a3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.4.tar.gz:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file malimgraph-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: malimgraph-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for malimgraph-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e82281068d0fb465736d55ee7adcea62fc33bd22c5749eeca3aa0010302da6b4
MD5 ff5bbed0d0fcf7fa667f96b428ac93df
BLAKE2b-256 42cbd3c2ff7219f9fe5318235fad9d55428f57d4d8b87cb2ed005f5a219a2b69

See more details on using hashes here.

Provenance

The following attestation bundles were made for malimgraph-0.1.4-py3-none-any.whl:

Publisher: publish.yml on malim-ai-labs/malim-graph-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page