Advanced AI-native Office document skills for Concinno — LLM-ready PDF extraction via docling (IBM) + Anthropic Office MCP bridge. docling MIT, pulls PyTorch.
Project description
concinno-skills-office-advanced
Advanced AI-native Office document skills for
Concinno. Sibling package to
concinno-skills-office.
| Sibling | Role |
|---|---|
concinno-skills-office |
Write Office docs with native Python libs (python-docx / openpyxl / xlsxwriter / python-pptx / docxtpl). No AI, no network. |
concinno-skills-office-advanced |
Read Office docs with AI pipeline (docling) + bridge Anthropic's official MCP skill server. |
Status
MVP (0.1.0) — three tools:
| Tool | Library | Purpose |
|---|---|---|
PdfAiExtract |
docling (IBM, MIT) |
LLM-ready PDF → markdown + structured tables |
DoclingPageImage |
docling |
render a single PDF page → PNG (base64 / bytes) for multimodal LLM |
AnthropicOfficeMcpBridge |
concinno.tools.mcp_bridge (2.15.0+) |
invoke Anthropic's official docx/xlsx/pptx/pdf skills over MCP stdio |
⚠ Heavy dependency footprint
docling transitively depends on PyTorch plus vision / OCR model
weights. Installing this package adds roughly 2 GB to the
environment, and the first PdfAiExtract.call downloads model weights
from HuggingFace (cached under ~/.cache/huggingface).
Prefer RunPod / high-spec desktops for benchmarks or heavy batches. A CPU-only laptop can run the pipeline but the first-page latency on a 10-page PDF is typically tens of seconds, not hundreds of milliseconds.
Install
pip install concinno-skills-office-advanced
Requires concinno >= 2.15.1. The MCP bridge adapter lives in Concinno
core (concinno.tools.mcp_bridge) — this package depends on it rather
than re-implementing JSON-RPC.
Usage via Concinno ToolRegistry
When the consumer sets CONCINNO_LOAD_PLUGINS=1, the default registry
auto-mounts all three tools:
import os
os.environ["CONCINNO_LOAD_PLUGINS"] = "1"
from concinno.tools.registry import get_default_registry
reg = get_default_registry()
for name in ("PdfAiExtract", "DoclingPageImage", "AnthropicOfficeMcpBridge"):
assert name in reg.list_deferred()
Direct Python usage
PdfAiExtract — LLM-ready markdown + tables
from concinno_skills_office_advanced import PdfAiExtract
out = PdfAiExtract().call(
action="extract",
path="./report.pdf",
output_format="markdown", # or "json" / "text"
)
# out = {
# "ok": True,
# "markdown": "# Report …",
# "tables": [{"page": 1, "html": "<table>…</table>", "csv": "a,b\n1,2\n"}],
# "page_count": 12,
# }
DoclingPageImage — PDF page → PNG for multimodal LLM
from concinno_skills_office_advanced import DoclingPageImage
out = DoclingPageImage().call(
action="render",
path="./report.pdf",
page=1,
dpi=150, # 36–600
)
# out["image_base64"] → "iVBORw0KGgo…"
# out["mime"] → "image/png"
Set return_bytes=True to get image_bytes (raw PNG bytes) instead of
base64 — useful when piping directly to disk.
AnthropicOfficeMcpBridge — Anthropic official skills via MCP
Configure the MCP server command once (any of the three methods):
# 1. Env var
export ANTHROPIC_OFFICE_MCP_CMD='npx -y @anthropic-ai/office-mcp'
# 2. Concinno credential store (persisted)
python -c "from concinno.core.credentials import CredentialStore; \
CredentialStore().set('anthropic_office_mcp_cmd', \
'npx -y @anthropic-ai/office-mcp')"
# 3. Pass server_cmd= kwarg directly per call (no persistence)
Then invoke:
from concinno_skills_office_advanced import AnthropicOfficeMcpBridge
# List what the server advertises
AnthropicOfficeMcpBridge().call(action="list_skills")
# → {"ok": True, "skills": [{"name": "docx", "description": "…"}, …]}
# Invoke a skill
AnthropicOfficeMcpBridge().call(
action="invoke",
skill_name="docx", # docx / xlsx / pptx / pdf
args={"title": "Report", "paragraphs": [...]},
)
# → {"ok": True, "skill": "docx", "content": ...}
If neither env / credential / kwarg is set, the tool returns
{"error": "Anthropic Office MCP server not configured — …"} rather
than silently swallowing the call.
Safety
pathmust point to an existing local file underPath.home()orPath.cwd(). URLs (http://,https://,file://,data:) are rejected before any filesystem access.- System directories (
/etc,/Windows,/System,/Program Files, …) are blocked even when they happen to live on the same volume as$HOME. PdfAiExtract/DoclingPageImagealso validate the suffix (.pdf).- Files larger than 100 MB trigger a
logger.warning— the tool still runs, but you get a stderr signal that cold-start latency will spike. AnthropicOfficeMcpBridgefails closed on unknown skill names (default allow-list:docx,xlsx,pptx,pdf; override viaANTHROPIC_OFFICE_MCP_SKILLS=docx,xlsx,…).
License
Apache-2.0. docling is MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file concinno_skills_office_advanced-0.1.0.tar.gz.
File metadata
- Download URL: concinno_skills_office_advanced-0.1.0.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7162eddc0a25acd04ffe9edfbb8201af70c10ee2f7ca909575afbf3e3f61983
|
|
| MD5 |
7c110cfe44bded5451d8a87a6399203f
|
|
| BLAKE2b-256 |
b7f738bcbeec0070f6aa255e04f573a4b8b63382d58a3b6627f6dd8b5e3536fc
|
File details
Details for the file concinno_skills_office_advanced-0.1.0-py3-none-any.whl.
File metadata
- Download URL: concinno_skills_office_advanced-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a263f702af43fd50436f30f5ad65568f850433b3ffa332038ac2e1887499f80
|
|
| MD5 |
d9ff761d18e40d86a71013c94c62ae82
|
|
| BLAKE2b-256 |
11ae771eb68136476047b72e77b2bbd0e0c857fc2992fe8b0fe5001138c395bf
|