Document to Markdown converter with LLM enhancement
Project description
Markitai
开箱即用的 Markdown 转换器,原生支持 LLM 增强。
特性
- 多格式支持 - DOCX/DOC, PPTX/PPT, XLSX/XLS, PDF, TXT, MD, JPG/PNG/WebP, URLs
- LLM 增强 - 格式清洗、元数据生成、图片分析
- 批量处理 - 并发转换、断点恢复、进度显示
- OCR 识别 - 扫描版 PDF 和图片文字提取
- URL 转换 - 直接转换网页,支持 SPA 浏览器渲染
安装
一键安装(推荐)
# Linux/macOS
curl -fsSL https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.ps1 | iex
手动安装
# 需要 Python 3.11+
uv tool install markitai
# 或使用 pip
pip install --user markitai
快速开始
# 基础转换
markitai document.docx
# URL 转换
markitai https://example.com/article
# LLM 增强
markitai document.docx --llm
# 使用预设
markitai document.pdf --preset rich # LLM + alt + desc + screenshot
markitai document.pdf --preset standard # LLM + alt + desc
markitai document.pdf --preset minimal # 仅基础转换
# 批量处理
markitai ./docs -o ./output
# 断点恢复
markitai ./docs -o ./output --resume
# URL 批量处理(自动识别 .urls 文件)
markitai urls.urls -o ./output
输出结构
output/
├── document.docx.md # 基础 Markdown
├── document.docx.llm.md # LLM 优化版
├── assets/
│ ├── document.docx.0001.jpg
│ └── images.json # 图片描述
├── screenshots/ # 页面截图(--screenshot 时)
│ └── example_com.full.jpg
配置
优先级:命令行 > 环境变量 > 配置文件 > 默认值
# 查看配置
markitai config list
# 初始化配置文件
markitai config init -o .
# 查看缓存状态
markitai cache stats
# 清理缓存
markitai cache clear
配置文件路径:./markitai.json 或 ~/.markitai/config.json
环境变量
| 变量 | 说明 |
|---|---|
OPENAI_API_KEY |
OpenAI API Key |
GEMINI_API_KEY |
Google Gemini API Key |
DEEPSEEK_API_KEY |
DeepSeek API Key |
ANTHROPIC_API_KEY |
Anthropic API Key |
JINA_API_KEY |
Jina Reader API Key(URL 转换) |
依赖
- pymupdf4llm - PDF 转换
- markitdown - Office 文档和 URL 转换
- LiteLLM - LLM 网关
- RapidOCR - OCR 识别
文档
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
markitai-0.3.0.tar.gz
(6.0 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
markitai-0.3.0-py3-none-any.whl
(192.3 kB
view details)
File details
Details for the file markitai-0.3.0.tar.gz.
File metadata
- Download URL: markitai-0.3.0.tar.gz
- Upload date:
- Size: 6.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d4290bb74d1924018b57b504495b94f26d2beea45becdf6f40e2856ce1b7ae9
|
|
| MD5 |
c998c3b328dceb62a2768981b8d2237a
|
|
| BLAKE2b-256 |
1ac0ac785b8970089549dcbc0d90f770a2cba73708cbe469f219ddbef3e31f5b
|
File details
Details for the file markitai-0.3.0-py3-none-any.whl.
File metadata
- Download URL: markitai-0.3.0-py3-none-any.whl
- Upload date:
- Size: 192.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b1c7485032d19ddf67c4694534391f5fe60832e5aa2f404bbb9e5c2341a8b6e
|
|
| MD5 |
42456bf1f683134a953db7072037b340
|
|
| BLAKE2b-256 |
0cba74101c6d2c6d9f1293cd4008ff40bcd04da507eeac79cfcb80d4caa2b920
|