Skip to main content

Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI Agents

Project description

asset-aware-mcp

๐Ÿฅ Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) and Knowledge Graph for AI Agents.

License

๐ŸŒ ็น้ซ”ไธญๆ–‡

๐ŸŽฏ Why Asset-Aware MCP?

AI cannot directly read image files on your computer. This is a common misconception.

Method Can AI analyze image content? Description
โŒ Provide PNG path No AI cannot access the local file system
โœ… Asset-Aware MCP Yes Retrieves Base64 via MCP, allowing AI vision to understand directly

Real-world Effect

# After retrieving the image via MCP, the AI can analyze it directly:

User: What is this figure about?

AI: This is the architecture diagram for Scaled Dot-Product Attention:
    1. Inputs: Q (Query), K (Key), V (Value)
    2. MatMul of Q and K
    3. Scale (1/โˆšdโ‚–)
    4. Optional Mask (for decoder)
    5. SoftMax normalization
    6. Final MatMul with V to get the output

This is the value of Asset-Aware MCP - enabling AI Agents to truly "see" and understand charts and tables in your PDF literature.


โœจ Features

  • ๐Ÿ“„ Asset-Aware ETL - PDF โ†’ Markdown with dual-engine PDF parsing:
    • PyMuPDF (default) - Fast extraction (~50MB)
    • Marker (optional, use_marker=True) - High-precision structured parsing with blocks.json (bbox/coordinates)
  • ๐Ÿงญ Section Navigation - Dynamic hierarchy section tree with 4 tools: browse, search, detail, and block extraction for any depth of headings.
  • ๐Ÿ”„ Async Job Pipeline - Supports asynchronous task processing and progress tracking for large documents.
  • ๐Ÿ—บ๏ธ Document Manifest - Provides a structured "map" of the document for precise data access by Agents.
  • ๐Ÿง  LightRAG Integration - Knowledge Graph + Vector Index, supporting cross-document comparison and reasoning.
  • ๐Ÿ“Š A2T (Anything to Table) - Automatically orchestrate information extracted by Agents into professional Excel tables, supporting CRUD, Drafting, and Token-efficient resumption.
  • ๏ฟฝ๏ธ VS Code Management Extension - Graphical interface for monitoring server status, ingested documents, and A2T tables/drafts with one-click Excel export.
  • ๏ฟฝ๐Ÿ”Œ MCP Server - Exposes tools and resources to Copilot/Claude via FastMCP.
  • ๐Ÿฅ Medical Research Focus - Optimized for medical literature, supporting Base64 image transmission for Vision AI analysis.

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    AI Agent (Copilot)                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚ MCP Protocol (Tools & Resources)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            MCP Server (Modular Presentation)            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ tools/: 34 tools in 5 modules                   โ”‚   โ”‚
โ”‚  โ”‚   document (5) โ”‚ section (4) โ”‚ job (4)          โ”‚   โ”‚
โ”‚  โ”‚   knowledge (2) โ”‚ table (19)                    โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ resources/: 12 resources in 2 modules           โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  ETL Pipeline (DDD)                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚ PyMuPDF  โ”‚  โ”‚  Asset   โ”‚  โ”‚ LightRAG โ”‚              โ”‚
โ”‚  โ”‚ Adapter  โ”‚โ†’ โ”‚  Parser  โ”‚โ†’ โ”‚  Index   โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Local Storage                         โ”‚
โ”‚  ./data/                                                โ”‚
โ”‚  โ”œโ”€โ”€ doc_{id}/        # Document Assets                 โ”‚
โ”‚  โ”œโ”€โ”€ tables/          # A2T Tables (JSON/MD/XLSX)       โ”‚
โ”‚  โ”‚   โ””โ”€โ”€ drafts/      # Table Drafts (Persistence)      โ”‚
โ”‚  โ””โ”€โ”€ lightrag/        # Knowledge Graph                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Project Structure (DDD)

asset-aware-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ domain/              # ๐Ÿ”ต Domain: Entities, Value Objects, Interfaces
โ”‚   โ”œโ”€โ”€ application/         # ๐ŸŸข Application: Doc Service, Table Service (A2T), Asset Service
โ”‚   โ”œโ”€โ”€ infrastructure/      # ๐ŸŸ  Infrastructure: PyMuPDF, LightRAG, Excel Renderer
โ”‚   โ””โ”€โ”€ presentation/        # ๐Ÿ”ด Presentation: MCP Server (FastMCP)
โ”œโ”€โ”€ data/                    # Document and Asset Storage
โ”œโ”€โ”€ docs/
โ”‚   โ””โ”€โ”€ spec.md              # Technical Specification
โ”œโ”€โ”€ tests/                   # Unit and Integration Tests
โ”œโ”€โ”€ vscode-extension/        # VS Code Management Extension
โ””โ”€โ”€ pyproject.toml           # uv Project Config

๐Ÿš€ Quick Start

# Install dependencies (using uv)
uv sync

# Run MCP Server
uv run python -m src.presentation.server

# Or use the VS Code extension for graphical management

๐Ÿ”Œ MCP Tools

Document & Asset Tools

Tool Purpose
ingest_documents Process PDF files with optional Marker backend (use_marker=True for blocks.json)
fetch_document_asset Precisely retrieve tables (MD) / figures (B64) / sections
consult_knowledge_graph Knowledge graph query, cross-document comparison

Section Navigation Tools (Dynamic Hierarchy)

Tool Purpose
list_section_tree Display complete section hierarchy tree (supports any depth)
get_section_detail Get detailed info for a specific section
get_section_blocks Extract all blocks from a section with page + bbox
search_sections Search section titles

A2T (Anything to Table) Tools

Tool Purpose
plan_table_schema AI-driven schema planning & brainstorming
create_table_draft Start a persistent draft session (Token-efficient)
add_rows_to_draft Batch add data to draft
commit_draft_to_table Finalize draft into a formal table
resume_draft / resume_table Resume work with minimal context (Save tokens)
update_cell Precise cell-level editing
render_table Render to professional Excel file (with conditional formatting)

๐Ÿ”ง Tech Stack

Category Technology
Language Python 3.10+
ETL PyMuPDF (fitz) + Marker (optional, high-precision)
RAG LightRAG (lightrag-hku)
MCP FastMCP
Storage Local filesystem (JSON/Markdown/PNG)

๐Ÿ“‹ Documentation

๐Ÿ“„ License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asset_aware_mcp-0.2.10.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asset_aware_mcp-0.2.10-py3-none-any.whl (95.5 kB view details)

Uploaded Python 3

File details

Details for the file asset_aware_mcp-0.2.10.tar.gz.

File metadata

  • Download URL: asset_aware_mcp-0.2.10.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asset_aware_mcp-0.2.10.tar.gz
Algorithm Hash digest
SHA256 fdf2abf721be27c459317c563ac560f1f8c1e498512c6222bad24a8445799ae2
MD5 05dc128bbc66eb55c4466cc3a4aea333
BLAKE2b-256 222faafd609126dd73d8eef29bf19e3aa4e6bd2c6c36c362e73d7832f134a44a

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.2.10.tar.gz:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asset_aware_mcp-0.2.10-py3-none-any.whl.

File metadata

File hashes

Hashes for asset_aware_mcp-0.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a1db6b8e0480c07e41fa5b0f311f2ca530fc75995f5d95016df9e0d4e7d518a0
MD5 5ae1d7759ea0d40d67598511519b6d0e
BLAKE2b-256 ed0d0caeaa22713bf90f31fa11813c77f9f8a4173e57416581a53d518e5d1138

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.2.10-py3-none-any.whl:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page