Skip to main content

Safe local document-to-markdown preprocessing for OpenClaw, Claude Code, Codex, Hermes, and other agents.

Project description

agent-markitdown

CI Release

Safe local document-to-markdown preprocessing for agents.

Built for OpenClaw first, but intentionally usable from Claude Code, Codex, Hermes Agent, and anything else that can run a local CLI or Python package.

What it is

agent-markitdown wraps Microsoft's excellent markitdown with an agent-oriented safety and workflow layer:

  • local files only
  • convert_local() only
  • plugins off by default
  • extension allowlist
  • size guardrail
  • deterministic JSON output
  • extraction warnings when markdown may be incomplete
  • review-pack generation for LLM handoff

Why this exists

Raw file uploads are awkward for agent workflows.

For supported document types, agents usually work better when they receive clean markdown instead of a binary attachment or a heavyweight vision/PDF pass.

That means:

  • lower context overhead
  • easier quoting and summarization
  • better portability across agent runtimes
  • safer, narrower preprocessing than raw markitdown convert()

What it is not

This package does not magically patch every agent runtime on earth.

It gives you a safe preprocessing layer plus integration assets. Each host agent still needs a tiny adapter or instruction layer telling it to run agent-markitdown before review.

OpenClaw gets a ready-made skill. Other agents get drop-in snippets.

Status

  • GitHub repo: live
  • CI/release workflows: included
  • PyPI publish path: ready once a token or trusted publisher is configured

Installation

uv venv .venv
uv pip install --python .venv/bin/python .
# or with test/dev dependencies
uv pip install --python .venv/bin/python '.[dev]'

Or from PyPI later:

pip install agent-markitdown

CLI

Convert one file to stdout

agent-markitdown convert ./report.pdf

Convert and emit JSON

agent-markitdown convert ./report.docx --json

JSON output includes a warnings array. It is empty for ordinary text extraction, and it calls out cases where the markdown should not be treated as complete, such as very low extracted text or image inputs that may need OCR/vision review.

Write sidecar markdown files

agent-markitdown convert ./report.pdf ./notes.docx --sidecar

Build one review bundle for an agent

agent-markitdown review-pack ./report.pdf ./notes.docx -o review-pack.md

Health check

agent-markitdown doctor

Supported extensions

  • .pdf
  • .docx
  • .pptx
  • .xlsx
  • .xls
  • .html, .htm
  • .csv, .tsv
  • .json, .xml
  • .txt, .md, .rtf
  • .epub
  • .jpg, .jpeg, .png, .gif, .bmp, .tif, .tiff, .webp

OpenClaw

See integrations/openclaw/SKILL.md.

That skill tells OpenClaw to preprocess supported uploaded documents into markdown before deeper review/summarization work.

Install the OpenClaw skill into a workspace:

./scripts/install-openclaw-skill.sh

Other agents

For copyable host-side patterns, see:

Security stance

This package intentionally avoids the broadest markitdown surfaces.

  • no remote URLs
  • no convert()
  • no plugins unless explicitly enabled
  • no ZIP traversal support
  • explicit extension allowlist
  • configurable size cap
  • warnings for low-text extraction and image inputs that may need OCR/vision

If you're handling untrusted uploads in a server context, keep validating paths and storing uploads in a controlled temp area. This package narrows the blast radius; it does not replace sane host hygiene.

Release flow

  • CI runs on push/PR
  • release workflow runs on v* tags
  • tagged releases build wheel + sdist and attach them to a GitHub release
  • PyPI publish is attempted automatically when either:
    • PYPI_API_TOKEN repo secret exists, or
    • PYPI_TRUSTED_PUBLISHING=true repo variable is set and PyPI trusted publishing is configured

See docs/publishing.md and docs/release-checklist.md.

Attribution

This project depends on and is inspired by Microsoft's markitdown, which is MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_markitdown-0.1.1.tar.gz (102.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_markitdown-0.1.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_markitdown-0.1.1.tar.gz.

File metadata

  • Download URL: agent_markitdown-0.1.1.tar.gz
  • Upload date:
  • Size: 102.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_markitdown-0.1.1.tar.gz
Algorithm Hash digest
SHA256 29cfbf8f8ed7b6997860e62e8d42c612bcb180086d81a50eacbd2261c2d4ec72
MD5 3b1ee176a86ad497d56a61eebc4bfd35
BLAKE2b-256 2c60cf54d90c26765f36d5f69654842a0f3eaed671f6a15c01728a498a20f8c8

See more details on using hashes here.

File details

Details for the file agent_markitdown-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_markitdown-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6f6074f400472f55b5b1e411fc4e5c9050b5da63e1b0eb4679d8032c1138b54c
MD5 f7e269adcfa4845adcdac924d2206e01
BLAKE2b-256 4a44210a23fe0c7a1e13cac8c1f9cf2658b54c9b85044c113f332c500e007e50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page