Skip to main content

Advanced AI-native Office document skills for Concinno — LLM-ready PDF extraction via docling (IBM) + Anthropic Office MCP bridge. docling MIT, pulls PyTorch.

Project description

concinno-skills-office-advanced

Advanced AI-native Office document skills for Concinno. Sibling package to concinno-skills-office.

Sibling Role
concinno-skills-office Write Office docs with native Python libs (python-docx / openpyxl / xlsxwriter / python-pptx / docxtpl). No AI, no network.
concinno-skills-office-advanced Read Office docs with AI pipeline (docling) + bridge Anthropic's official MCP skill server.

Status

MVP (0.1.0) — three tools:

Tool Library Purpose
PdfAiExtract docling (IBM, MIT) LLM-ready PDF → markdown + structured tables
DoclingPageImage docling render a single PDF page → PNG (base64 / bytes) for multimodal LLM
AnthropicOfficeMcpBridge concinno.tools.mcp_bridge (2.15.0+) invoke Anthropic's official docx/xlsx/pptx/pdf skills over MCP stdio

⚠ Heavy dependency footprint

docling transitively depends on PyTorch plus vision / OCR model weights. Installing this package adds roughly 2 GB to the environment, and the first PdfAiExtract.call downloads model weights from HuggingFace (cached under ~/.cache/huggingface).

Prefer RunPod / high-spec desktops for benchmarks or heavy batches. A CPU-only laptop can run the pipeline but the first-page latency on a 10-page PDF is typically tens of seconds, not hundreds of milliseconds.

Install

pip install concinno-skills-office-advanced

Requires concinno >= 2.15.1. The MCP bridge adapter lives in Concinno core (concinno.tools.mcp_bridge) — this package depends on it rather than re-implementing JSON-RPC.

Usage via Concinno ToolRegistry

When the consumer sets CONCINNO_LOAD_PLUGINS=1, the default registry auto-mounts all three tools:

import os
os.environ["CONCINNO_LOAD_PLUGINS"] = "1"

from concinno.tools.registry import get_default_registry

reg = get_default_registry()
for name in ("PdfAiExtract", "DoclingPageImage", "AnthropicOfficeMcpBridge"):
    assert name in reg.list_deferred()

Direct Python usage

PdfAiExtract — LLM-ready markdown + tables

from concinno_skills_office_advanced import PdfAiExtract

out = PdfAiExtract().call(
    action="extract",
    path="./report.pdf",
    output_format="markdown",  # or "json" / "text"
)
# out = {
#     "ok": True,
#     "markdown": "# Report …",
#     "tables": [{"page": 1, "html": "<table>…</table>", "csv": "a,b\n1,2\n"}],
#     "page_count": 12,
# }

DoclingPageImage — PDF page → PNG for multimodal LLM

from concinno_skills_office_advanced import DoclingPageImage

out = DoclingPageImage().call(
    action="render",
    path="./report.pdf",
    page=1,
    dpi=150,  # 36–600
)
# out["image_base64"]  → "iVBORw0KGgo…"
# out["mime"]          → "image/png"

Set return_bytes=True to get image_bytes (raw PNG bytes) instead of base64 — useful when piping directly to disk.

AnthropicOfficeMcpBridge — Anthropic official skills via MCP

Configure the MCP server command once (any of the three methods):

# 1. Env var
export ANTHROPIC_OFFICE_MCP_CMD='npx -y @anthropic-ai/office-mcp'

# 2. Concinno credential store (persisted)
python -c "from concinno.core.credentials import CredentialStore; \
    CredentialStore().set('anthropic_office_mcp_cmd', \
    'npx -y @anthropic-ai/office-mcp')"

# 3. Pass server_cmd= kwarg directly per call (no persistence)

Then invoke:

from concinno_skills_office_advanced import AnthropicOfficeMcpBridge

# List what the server advertises
AnthropicOfficeMcpBridge().call(action="list_skills")
# → {"ok": True, "skills": [{"name": "docx", "description": "…"}, …]}

# Invoke a skill
AnthropicOfficeMcpBridge().call(
    action="invoke",
    skill_name="docx",     # docx / xlsx / pptx / pdf
    args={"title": "Report", "paragraphs": [...]},
)
# → {"ok": True, "skill": "docx", "content": ...}

If neither env / credential / kwarg is set, the tool returns {"error": "Anthropic Office MCP server not configured — …"} rather than silently swallowing the call.

Safety

  • path must point to an existing local file under Path.home() or Path.cwd(). URLs (http://, https://, file://, data:) are rejected before any filesystem access.
  • System directories (/etc, /Windows, /System, /Program Files, …) are blocked even when they happen to live on the same volume as $HOME.
  • PdfAiExtract / DoclingPageImage also validate the suffix (.pdf).
  • Files larger than 100 MB trigger a logger.warning — the tool still runs, but you get a stderr signal that cold-start latency will spike.
  • AnthropicOfficeMcpBridge fails closed on unknown skill names (default allow-list: docx, xlsx, pptx, pdf; override via ANTHROPIC_OFFICE_MCP_SKILLS=docx,xlsx,…).

License

Apache-2.0. docling is MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concinno_skills_office_advanced-0.1.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

concinno_skills_office_advanced-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file concinno_skills_office_advanced-0.1.0.tar.gz.

File metadata

File hashes

Hashes for concinno_skills_office_advanced-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f7162eddc0a25acd04ffe9edfbb8201af70c10ee2f7ca909575afbf3e3f61983
MD5 7c110cfe44bded5451d8a87a6399203f
BLAKE2b-256 b7f738bcbeec0070f6aa255e04f573a4b8b63382d58a3b6627f6dd8b5e3536fc

See more details on using hashes here.

File details

Details for the file concinno_skills_office_advanced-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for concinno_skills_office_advanced-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a263f702af43fd50436f30f5ad65568f850433b3ffa332038ac2e1887499f80
MD5 d9ff761d18e40d86a71013c94c62ae82
BLAKE2b-256 11ae771eb68136476047b72e77b2bbd0e0c857fc2992fe8b0fe5001138c395bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page