Skip to main content

Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI Agents

Project description

asset-aware-mcp

๐Ÿฅ Medical RAG with Asset-Aware MCP - ่ฎ“ AI Agent ็ฒพๆบ–ๅญ˜ๅ– PDF ๆ–‡็ปไธญ็š„่กจๆ ผใ€็ซ ็ฏ€่ˆ‡็Ÿฅ่ญ˜ๅœ–่ญœ

License

๐ŸŒ ็น้ซ”ไธญๆ–‡

โœจ Features

  • ๐Ÿ“„ Asset-Aware ETL - PDF โ†’ Markdown, using PyMuPDF to automatically identify tables, sections, and images
  • ๐Ÿ”„ Async Job Pipeline - Supports asynchronous task processing, tracking progress for large documents
  • ๐Ÿ—บ๏ธ Document Manifest - Structured list, allowing Agents to "see the map" before precisely accessing data
  • ๐Ÿง  LightRAG Integration - Knowledge Graph + Vector Index, supporting cross-document comparison and reasoning
  • ๐Ÿ”Œ MCP Server - Exposes tools and resources to Copilot/Claude via FastMCP
  • ๐Ÿฅ Medical Research Focus - Optimized for medical literature, supporting Base64 image transmission for Vision AI analysis

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    AI Agent (Copilot)                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚ MCP Protocol (Tools & Resources)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 MCP Server (server.py)                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚   ingest    โ”‚ โ”‚  inspect    โ”‚ โ”‚     fetch       โ”‚   โ”‚
โ”‚  โ”‚  documents  โ”‚ โ”‚  manifest   โ”‚ โ”‚     asset       โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚          consult_knowledge_graph                โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  ETL Pipeline (DDD)                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚ PyMuPDF  โ”‚  โ”‚  Asset   โ”‚  โ”‚ LightRAG โ”‚              โ”‚
โ”‚  โ”‚ Adapter  โ”‚โ†’ โ”‚  Parser  โ”‚โ†’ โ”‚  Index   โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Local Storage                         โ”‚
โ”‚  ./data/                                                โ”‚
โ”‚  โ”œโ”€โ”€ doc_{id}/                                          โ”‚
โ”‚  โ”‚   โ”œโ”€โ”€ full.md          # Markdown Content            โ”‚
โ”‚  โ”‚   โ”œโ”€โ”€ manifest.json    # Asset Map                   โ”‚
โ”‚  โ”‚   โ””โ”€โ”€ images/          # Extracted Figures           โ”‚
โ”‚  โ””โ”€โ”€ lightrag/            # Knowledge Graph             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Project Structure (DDD)

asset-aware-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ domain/              # ๐Ÿ”ต Domain: Entities, Value Objects, Interfaces
โ”‚   โ”œโ”€โ”€ application/         # ๐ŸŸข Application: Doc Service, Job Service, Asset Service
โ”‚   โ”œโ”€โ”€ infrastructure/      # ๐ŸŸ  Infrastructure: PyMuPDF, LightRAG, File Storage
โ”‚   โ””โ”€โ”€ presentation/        # ๐Ÿ”ด Presentation: MCP Server (FastMCP)
โ”œโ”€โ”€ data/                    # Document and Asset Storage
โ”œโ”€โ”€ docs/
โ”‚   โ””โ”€โ”€ spec.md              # Technical Specification
โ”œโ”€โ”€ tests/                   # Unit and Integration Tests
โ”œโ”€โ”€ vscode-extension/        # VS Code Management Extension
โ””โ”€โ”€ pyproject.toml           # uv Project Config

๐Ÿš€ Quick Start

# Install dependencies (using uv)
uv sync

# Run MCP Server
uv run python -m src.presentation.server

# Or use the VS Code extension for graphical management

๐Ÿ”Œ MCP Tools

Tool Purpose
ingest_documents ๅŒฏๅ…ฅ PDF๏ผŒ่งธ็™ผ ETL pipeline (ๆ”ฏๆด async)
get_job_status ๆชขๆŸฅ ETL ไปปๅ‹™้€ฒๅบฆ
list_documents ๅˆ—ๅ‡บๆ‰€ๆœ‰ๅทฒ่™•็†็š„ๆ–‡ไปถ
inspect_document_manifest ๆŸฅ็œ‹ๆ–‡ไปถ็ตๆง‹ๅœฐๅœ– (่กจๆ ผ/ๅœ–็‰‡/็ซ ็ฏ€)
fetch_document_asset ็ฒพๆบ–ๅ–ๅพ—่กจๆ ผ (MD) / ๅœ–็‰‡ (B64) / ็ซ ็ฏ€
consult_knowledge_graph ็Ÿฅ่ญ˜ๅœ–่ญœๆŸฅ่ฉข๏ผŒ่ทจๆ–‡็ปๆฏ”่ผƒ

๐Ÿ”ง Tech Stack

Category Technology
Language Python 3.10+
ETL PyMuPDF (fitz)
RAG LightRAG (lightrag-hku)
MCP FastMCP
Storage Local filesystem (JSON/Markdown/PNG)

๐Ÿ“‹ Documentation

๐Ÿ“„ License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asset_aware_mcp-0.1.1.tar.gz (373.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asset_aware_mcp-0.1.1-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file asset_aware_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: asset_aware_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 373.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asset_aware_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 867d0c41dc3f0caaf51c5b88b315716fb944bfd775598ce5720f14747ec011b8
MD5 8a61cc3783cb3e8d7ffe2123d4202148
BLAKE2b-256 80e880ae99580f93c5d55e82fafb832448e0d297b03658ac91600ac47762c0d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.1.1.tar.gz:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asset_aware_mcp-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for asset_aware_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27ce1f05298290b75f98e0cb299976401217bff606963376b1fd1fcae97fbc03
MD5 f8d83ccc55661cbaa01e1af11b103176
BLAKE2b-256 ebeb3b9c27a8b642e442817ae8e92f14b2af1ea5344d206aadfdbf86d68b838c

See more details on using hashes here.

Provenance

The following attestation bundles were made for asset_aware_mcp-0.1.1-py3-none-any.whl:

Publisher: release.yml on u9401066/asset-aware-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page