Safe local document-to-markdown preprocessing for OpenClaw, Claude Code, Codex, Hermes, and other agents.
Project description
agent-markitdown
Safe local document-to-markdown preprocessing for agents.
Built for OpenClaw first, but intentionally usable from Claude Code, Codex, Hermes Agent, and anything else that can run a local CLI or Python package.
What it is
agent-markitdown wraps Microsoft's excellent markitdown with an agent-oriented safety and workflow layer:
- local files only
convert_local()only- plugins off by default
- extension allowlist
- size guardrail
- deterministic JSON output
- extraction warnings when markdown may be incomplete
- review-pack generation for LLM handoff
Why this exists
Raw file uploads are awkward for agent workflows.
For supported document types, agents usually work better when they receive clean markdown instead of a binary attachment or a heavyweight vision/PDF pass.
That means:
- lower context overhead
- easier quoting and summarization
- better portability across agent runtimes
- safer, narrower preprocessing than raw
markitdown convert()
What it is not
This package does not magically patch every agent runtime on earth.
It gives you a safe preprocessing layer plus integration assets. Each host agent still needs a tiny adapter or instruction layer telling it to run agent-markitdown before review.
OpenClaw gets a ready-made skill. Other agents get drop-in snippets.
Status
- GitHub repo: live
- CI/release workflows: included
- PyPI publish path: ready once a token or trusted publisher is configured
Installation
uv venv .venv
uv pip install --python .venv/bin/python .
# or with test/dev dependencies
uv pip install --python .venv/bin/python '.[dev]'
Or from PyPI later:
pip install agent-markitdown
CLI
Convert one file to stdout
agent-markitdown convert ./report.pdf
Convert and emit JSON
agent-markitdown convert ./report.docx --json
JSON output includes a warnings array. It is empty for ordinary text extraction, and it calls out cases where the markdown should not be treated as complete, such as very low extracted text or image inputs that may need OCR/vision review.
Write sidecar markdown files
agent-markitdown convert ./report.pdf ./notes.docx --sidecar
Build one review bundle for an agent
agent-markitdown review-pack ./report.pdf ./notes.docx -o review-pack.md
Health check
agent-markitdown doctor
Supported extensions
.pdf.docx.pptx.xlsx.xls.html,.htm.csv,.tsv.json,.xml.txt,.md,.rtf.epub.jpg,.jpeg,.png,.gif,.bmp,.tif,.tiff,.webp
OpenClaw
See integrations/openclaw/SKILL.md.
That skill tells OpenClaw to preprocess supported uploaded documents into markdown before deeper review/summarization work.
Install the OpenClaw skill into a workspace:
./scripts/install-openclaw-skill.sh
Other agents
- Claude Code:
integrations/claude-code/AGENTS.md - Codex:
integrations/codex/AGENTS.md - Hermes Agent:
integrations/hermes-agent/SKILL.md
For copyable host-side patterns, see:
examples/review-pack-consumers/for a generic review-pack handoffexamples/auto-preprocess-adapters/for profile-specific prompt adapters that can sit in front of agent CLIs
Security stance
This package intentionally avoids the broadest markitdown surfaces.
- no remote URLs
- no
convert() - no plugins unless explicitly enabled
- no ZIP traversal support
- explicit extension allowlist
- configurable size cap
- warnings for low-text extraction and image inputs that may need OCR/vision
If you're handling untrusted uploads in a server context, keep validating paths and storing uploads in a controlled temp area. This package narrows the blast radius; it does not replace sane host hygiene.
Release flow
- CI runs on push/PR
- release workflow runs on
v*tags - tagged releases build wheel + sdist and attach them to a GitHub release
- PyPI publish is attempted automatically when either:
PYPI_API_TOKENrepo secret exists, orPYPI_TRUSTED_PUBLISHING=truerepo variable is set and PyPI trusted publishing is configured
See docs/publishing.md and docs/release-checklist.md.
Attribution
This project depends on and is inspired by Microsoft's markitdown, which is MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_markitdown-0.1.1.tar.gz.
File metadata
- Download URL: agent_markitdown-0.1.1.tar.gz
- Upload date:
- Size: 102.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29cfbf8f8ed7b6997860e62e8d42c612bcb180086d81a50eacbd2261c2d4ec72
|
|
| MD5 |
3b1ee176a86ad497d56a61eebc4bfd35
|
|
| BLAKE2b-256 |
2c60cf54d90c26765f36d5f69654842a0f3eaed671f6a15c01728a498a20f8c8
|
File details
Details for the file agent_markitdown-0.1.1-py3-none-any.whl.
File metadata
- Download URL: agent_markitdown-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f6074f400472f55b5b1e411fc4e5c9050b5da63e1b0eb4679d8032c1138b54c
|
|
| MD5 |
f7e269adcfa4845adcdac924d2206e01
|
|
| BLAKE2b-256 |
4a44210a23fe0c7a1e13cac8c1f9cf2658b54c9b85044c113f332c500e007e50
|