Skip to main content

Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI Agents

Project description

asset-aware-mcp

๐Ÿฅ Medical RAG with Asset-Aware MCP - ่ฎ“ AI Agent ็ฒพๆบ–ๅญ˜ๅ– PDF ๆ–‡็ปไธญ็š„่กจๆ ผใ€็ซ ็ฏ€่ˆ‡็Ÿฅ่ญ˜ๅœ–่ญœ

License

๐ŸŒ ็น้ซ”ไธญๆ–‡

๐ŸŽฏ Why Asset-Aware MCP?

AI ็„กๆณ•็›ดๆŽฅ่ฎ€ๅ–ไฝ ้›ป่…ฆ่ฃก็š„ๅœ–็‰‡ๆช”ๆกˆใ€‚ ้€™ๆ˜ฏไธ€ๅ€‹ๅธธ่ฆ‹็š„่ชค่งฃใ€‚

ๆ–นๅผ AI ่ƒฝๅˆ†ๆžๅœ–็‰‡ๅ…งๅฎน๏ผŸ ่ชชๆ˜Ž
โŒ ็ตฆ PNG ่ทฏๅพ‘ ๅฆ AI ็„กๆณ•ๅญ˜ๅ–ๆœฌๅœฐๆช”ๆกˆ็ณป็ตฑ
โœ… Asset-Aware MCP ๆ˜ฏ ้€้Ž MCP ๅ–ๅพ— Base64๏ผŒAI ่ฆ–่ฆบ่ƒฝๅŠ›ๅฏ็›ดๆŽฅ็†่งฃ

ๅฏฆ้š›ๆ•ˆๆžœ

# ้€้Ž MCP ๅ–ๅพ—ๅœ–็‰‡ๅพŒ๏ผŒAI ๅฏไปฅ็›ดๆŽฅๅˆ†ๆž๏ผš

User: ้€™ๅผตๅœ–ๅœจ่ฌ›ไป€้บผ๏ผŸ

AI: ้€™ๆ˜ฏ Scaled Dot-Product Attention ็š„ๆžถๆง‹ๅœ–๏ผš
    1. ่ผธๅ…ฅ Q (Query)ใ€K (Key)ใ€V (Value)
    2. Q ๅ’Œ K ๅš MatMul๏ผˆ็Ÿฉ้™ฃไน˜ๆณ•๏ผ‰
    3. ็ถ“้Ž Scale๏ผˆ็ธฎๆ”พ 1/โˆšdโ‚–๏ผ‰
    4. ๅฏ้ธ็š„ Mask๏ผˆ็”จๆ–ผ decoder๏ผ‰
    5. SoftMax ๆญธไธ€ๅŒ–
    6. ่ˆ‡ V ๅšๆœ€ๅพŒไธ€ๆฌก MatMul ๅพ—ๅˆฐ่ผธๅ‡บ

้€™ๅฐฑๆ˜ฏ Asset-Aware MCP ็š„ๅƒนๅ€ผ - ่ฎ“ AI Agent ็œŸๆญฃใ€Œ็œ‹ๆ‡‚ใ€ไฝ ็š„ PDF ๆ–‡็ปไธญ็š„ๅœ–่กจใ€‚


โœจ Features

  • ๐Ÿ“„ Asset-Aware ETL - PDF โ†’ Markdown, using PyMuPDF to automatically identify tables, sections, and images
  • ๐Ÿ”„ Async Job Pipeline - Supports asynchronous task processing, tracking progress for large documents
  • ๐Ÿ—บ๏ธ Document Manifest - Structured list, allowing Agents to "see the map" before precisely accessing data
  • ๐Ÿง  LightRAG Integration - Knowledge Graph + Vector Index, supporting cross-document comparison and reasoning
  • ๐Ÿ“Š A2T (Anything to Table) - Automatically orchestrate information extracted by Agents into professional Excel tables, supporting CRUD, Drafting, and Token-efficient resumption.
  • ๏ฟฝ๐Ÿ”Œ MCP Server - Exposes tools and resources to Copilot/Claude via FastMCP
  • ๐Ÿฅ Medical Research Focus - Optimized for medical literature, supporting Base64 image transmission for Vision AI analysis

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    AI Agent (Copilot)                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚ MCP Protocol (Tools & Resources)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 MCP Server (server.py)                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚   ingest    โ”‚ โ”‚  inspect    โ”‚ โ”‚     fetch       โ”‚   โ”‚
โ”‚  โ”‚  documents  โ”‚ โ”‚  manifest   โ”‚ โ”‚     asset       โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚          A2T (Anything to Table) Workflow       โ”‚   โ”‚
โ”‚  โ”‚  [Plan] โ†’ [Draft] โ†’ [Batch Add] โ†’ [Commit]      โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  ETL Pipeline (DDD)                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚ PyMuPDF  โ”‚  โ”‚  Asset   โ”‚  โ”‚ LightRAG โ”‚              โ”‚
โ”‚  โ”‚ Adapter  โ”‚โ†’ โ”‚  Parser  โ”‚โ†’ โ”‚  Index   โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Local Storage                         โ”‚
โ”‚  ./data/                                                โ”‚
โ”‚  โ”œโ”€โ”€ doc_{id}/        # Document Assets                 โ”‚
โ”‚  โ”œโ”€โ”€ tables/          # A2T Tables (JSON/MD/XLSX)       โ”‚
โ”‚  โ”‚   โ””โ”€โ”€ drafts/      # Table Drafts (Persistence)      โ”‚
โ”‚  โ””โ”€โ”€ lightrag/        # Knowledge Graph                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Project Structure (DDD)

asset-aware-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ domain/              # ๐Ÿ”ต Domain: Entities, Value Objects, Interfaces
โ”‚   โ”œโ”€โ”€ application/         # ๐ŸŸข Application: Doc Service, Table Service (A2T), Asset Service
โ”‚   โ”œโ”€โ”€ infrastructure/      # ๐ŸŸ  Infrastructure: PyMuPDF, LightRAG, Excel Renderer
โ”‚   โ””โ”€โ”€ presentation/        # ๐Ÿ”ด Presentation: MCP Server (FastMCP)
โ”œโ”€โ”€ data/                    # Document and Asset Storage
โ”œโ”€โ”€ docs/
โ”‚   โ””โ”€โ”€ spec.md              # Technical Specification
โ”œโ”€โ”€ tests/                   # Unit and Integration Tests
โ”œโ”€โ”€ vscode-extension/        # VS Code Management Extension
โ””โ”€โ”€ pyproject.toml           # uv Project Config

๐Ÿš€ Quick Start

# Install dependencies (using uv)
uv sync

# Run MCP Server
uv run python -m src.presentation.server

# Or use the VS Code extension for graphical management

๐Ÿ”Œ MCP Tools

Tool Purpose
fetch_document_asset Precisely retrieve tables (MD) / figures (B64) / sections
consult_knowledge_graph Knowledge graph query, cross-document comparison
plan_table_schema AI-driven schema planning & brainstorming (๐Ÿ†•)
create_table_draft Start a persistent draft session (Token-efficient)
add_rows_to_draft Batch add data to draft
commit_draft_to_table Finalize draft into a formal table
resume_draft / resume_table Resume work with minimal context (Save tokens)
update_cell Precise cell-level editing
render_table Render to professional Excel file (with conditional formatting)

๐Ÿ”ง Tech Stack

Category Technology
Language Python 3.10+
ETL PyMuPDF (fitz)
RAG LightRAG (lightrag-hku)
MCP FastMCP
Storage Local filesystem (JSON/Markdown/PNG)

๐Ÿ“‹ Documentation

๐Ÿ“„ License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asset_aware_mcp-0.2.0.tar.gz (393.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asset_aware_mcp-0.2.0-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file asset_aware_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: asset_aware_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 393.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asset_aware_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 01b523872e12c02e294c2f37eebe7d6c22d1fe4ee87cc6923a0594fdc5e84694
MD5 b7d3435d6c49acf5e391a86b1a6ee7ab
BLAKE2b-256 62a9dbcd2da258d11f282e6dadae564eeee9174fc6ee856d5988a7967672ed9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.2.0.tar.gz:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asset_aware_mcp-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for asset_aware_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f8f9a7e4f5c62cfc2441eeb9be42e343013d30e390c99ecc3c506c236a00646
MD5 ea372dec89007463425e12300758c472
BLAKE2b-256 74005b7636338a0e079b321c8fde2f86a00c28658720f1a842f485ca96c5b229

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.2.0-py3-none-any.whl:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page