Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI Agents
Project description
asset-aware-mcp
๐ฅ Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) and Knowledge Graph for AI Agents.
๐ ็น้ซไธญๆ
๐ฏ Why Asset-Aware MCP?
AI cannot directly read image files on your computer. This is a common misconception.
| Method | Can AI analyze image content? | Description |
|---|---|---|
| โ Provide PNG path | No | AI cannot access the local file system |
| โ Asset-Aware MCP | Yes | Retrieves Base64 via MCP, allowing AI vision to understand directly |
Real-world Effect
# After retrieving the image via MCP, the AI can analyze it directly:
User: What is this figure about?
AI: This is the architecture diagram for Scaled Dot-Product Attention:
1. Inputs: Q (Query), K (Key), V (Value)
2. MatMul of Q and K
3. Scale (1/โdโ)
4. Optional Mask (for decoder)
5. SoftMax normalization
6. Final MatMul with V to get the output
This is the value of Asset-Aware MCP - enabling AI Agents to truly "see" and understand charts and tables in your PDF literature.
โจ Features
- ๐ Asset-Aware ETL - PDF โ Markdown, using PyMuPDF to automatically identify tables, sections, and images.
- ๐ Async Job Pipeline - Supports asynchronous task processing and progress tracking for large documents.
- ๐บ๏ธ Document Manifest - Provides a structured "map" of the document for precise data access by Agents.
- ๐ง LightRAG Integration - Knowledge Graph + Vector Index, supporting cross-document comparison and reasoning.
- ๐ A2T (Anything to Table) - Automatically orchestrate information extracted by Agents into professional Excel tables, supporting CRUD, Drafting, and Token-efficient resumption.
- ๐ MCP Server - Exposes tools and resources to Copilot/Claude via FastMCP.
- ๐ฅ Medical Research Focus - Optimized for medical literature, supporting Base64 image transmission for Vision AI analysis.
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AI Agent (Copilot) โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol (Tools & Resources)
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Server (server.py) โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ ingest โ โ inspect โ โ fetch โ โ
โ โ documents โ โ manifest โ โ asset โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ A2T (Anything to Table) Workflow โ โ
โ โ [Plan] โ [Draft] โ [Batch Add] โ [Commit] โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ETL Pipeline (DDD) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ PyMuPDF โ โ Asset โ โ LightRAG โ โ
โ โ Adapter โโ โ Parser โโ โ Index โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Local Storage โ
โ ./data/ โ
โ โโโ doc_{id}/ # Document Assets โ
โ โโโ tables/ # A2T Tables (JSON/MD/XLSX) โ
โ โ โโโ drafts/ # Table Drafts (Persistence) โ
โ โโโ lightrag/ # Knowledge Graph โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Project Structure (DDD)
asset-aware-mcp/
โโโ src/
โ โโโ domain/ # ๐ต Domain: Entities, Value Objects, Interfaces
โ โโโ application/ # ๐ข Application: Doc Service, Table Service (A2T), Asset Service
โ โโโ infrastructure/ # ๐ Infrastructure: PyMuPDF, LightRAG, Excel Renderer
โ โโโ presentation/ # ๐ด Presentation: MCP Server (FastMCP)
โโโ data/ # Document and Asset Storage
โโโ docs/
โ โโโ spec.md # Technical Specification
โโโ tests/ # Unit and Integration Tests
โโโ vscode-extension/ # VS Code Management Extension
โโโ pyproject.toml # uv Project Config
๐ Quick Start
# Install dependencies (using uv)
uv sync
# Run MCP Server
uv run python -m src.presentation.server
# Or use the VS Code extension for graphical management
๐ MCP Tools
| Tool | Purpose |
|---|---|
fetch_document_asset |
Precisely retrieve tables (MD) / figures (B64) / sections |
consult_knowledge_graph |
Knowledge graph query, cross-document comparison |
plan_table_schema |
AI-driven schema planning & brainstorming (๐) |
create_table_draft |
Start a persistent draft session (Token-efficient) |
add_rows_to_draft |
Batch add data to draft |
commit_draft_to_table |
Finalize draft into a formal table |
resume_draft / resume_table |
Resume work with minimal context (Save tokens) |
update_cell |
Precise cell-level editing |
render_table |
Render to professional Excel file (with conditional formatting) |
๐ง Tech Stack
| Category | Technology |
|---|---|
| Language | Python 3.10+ |
| ETL | PyMuPDF (fitz) |
| RAG | LightRAG (lightrag-hku) |
| MCP | FastMCP |
| Storage | Local filesystem (JSON/Markdown/PNG) |
๐ Documentation
- Technical Spec - Detailed technical specification
- Architecture - System architecture
- Constitution - Project principles
๐ License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asset_aware_mcp-0.2.4.tar.gz.
File metadata
- Download URL: asset_aware_mcp-0.2.4.tar.gz
- Upload date:
- Size: 409.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4e6e1644c9ed3125c8376c197c5c6cd4172c78dfcb418b7f97d3ae0632c0dbf
|
|
| MD5 |
ff4d796ea52642f03dff6177ad8b49d7
|
|
| BLAKE2b-256 |
4294be8c3c04e160dbc5429caa6875267c825d79c968e6f23c3023778c18b203
|
Provenance
The following attestation bundles were made for asset_aware_mcp-0.2.4.tar.gz:
Publisher:
release.yml on u9401066/asset-aware-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asset_aware_mcp-0.2.4.tar.gz -
Subject digest:
d4e6e1644c9ed3125c8376c197c5c6cd4172c78dfcb418b7f97d3ae0632c0dbf - Sigstore transparency entry: 791371806
- Sigstore integration time:
-
Permalink:
u9401066/asset-aware-mcp@2a4f9fd05eac19f6ec384e299b19b564998aa7ea -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/u9401066
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2a4f9fd05eac19f6ec384e299b19b564998aa7ea -
Trigger Event:
push
-
Statement type:
File details
Details for the file asset_aware_mcp-0.2.4-py3-none-any.whl.
File metadata
- Download URL: asset_aware_mcp-0.2.4-py3-none-any.whl
- Upload date:
- Size: 68.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efed9e89d610363cd4db61db2e44449c2d0638e59e61a8742a5cbb8ea5d53e0d
|
|
| MD5 |
a9e1baabcefdace0907ea9517727b507
|
|
| BLAKE2b-256 |
cba418058196992abcf17303a3f7154e7e07b6eec3cfb82c721dfd64323bddcb
|
Provenance
The following attestation bundles were made for asset_aware_mcp-0.2.4-py3-none-any.whl:
Publisher:
release.yml on u9401066/asset-aware-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asset_aware_mcp-0.2.4-py3-none-any.whl -
Subject digest:
efed9e89d610363cd4db61db2e44449c2d0638e59e61a8742a5cbb8ea5d53e0d - Sigstore transparency entry: 791371813
- Sigstore integration time:
-
Permalink:
u9401066/asset-aware-mcp@2a4f9fd05eac19f6ec384e299b19b564998aa7ea -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/u9401066
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2a4f9fd05eac19f6ec384e299b19b564998aa7ea -
Trigger Event:
push
-
Statement type: