Skip to main content

Document to Markdown converter with LLM enhancement

Project description

Markitai

开箱即用的 Markdown 转换器,原生支持 LLM 增强。

特性

  • 多格式支持 - DOCX/DOC, PPTX/PPT, XLSX/XLS, PDF, TXT, MD, JPG/PNG/WebP, URLs
  • LLM 增强 - 格式清洗、元数据生成、图片分析
  • 批量处理 - 并发转换、断点恢复、进度显示
  • OCR 识别 - 扫描版 PDF 和图片文字提取
  • URL 转换 - 直接转换网页,支持 SPA 浏览器渲染

安装

一键安装(推荐)

# Linux/macOS
curl -fsSL https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.ps1 | iex

手动安装

# 需要 Python 3.11+
uv tool install markitai

# 或使用 pip
pip install --user markitai

快速开始

# 基础转换
markitai document.docx

# URL 转换
markitai https://example.com/article

# LLM 增强
markitai document.docx --llm

# 使用预设
markitai document.pdf --preset rich      # LLM + alt + desc + screenshot
markitai document.pdf --preset standard  # LLM + alt + desc
markitai document.pdf --preset minimal   # 仅基础转换

# 批量处理
markitai ./docs -o ./output

# 断点恢复
markitai ./docs -o ./output --resume

# URL 批量处理(自动识别 .urls 文件)
markitai urls.urls -o ./output

输出结构

output/
├── document.docx.md        # 基础 Markdown
├── document.docx.llm.md    # LLM 优化版
├── assets/
│   ├── document.docx.0001.jpg
│   └── images.json         # 图片描述
├── screenshots/            # 页面截图(--screenshot 时)
│   └── example_com.full.jpg

配置

优先级:命令行 > 环境变量 > 配置文件 > 默认值

# 查看配置
markitai config list

# 初始化配置文件
markitai config init -o .

# 查看缓存状态
markitai cache stats

# 清理缓存
markitai cache clear

配置文件路径:./markitai.json~/.markitai/config.json

环境变量

变量 说明
OPENAI_API_KEY OpenAI API Key
GEMINI_API_KEY Google Gemini API Key
DEEPSEEK_API_KEY DeepSeek API Key
ANTHROPIC_API_KEY Anthropic API Key
JINA_API_KEY Jina Reader API Key(URL 转换)

依赖

文档

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitai-0.3.0.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markitai-0.3.0-py3-none-any.whl (192.3 kB view details)

Uploaded Python 3

File details

Details for the file markitai-0.3.0.tar.gz.

File metadata

  • Download URL: markitai-0.3.0.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for markitai-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7d4290bb74d1924018b57b504495b94f26d2beea45becdf6f40e2856ce1b7ae9
MD5 c998c3b328dceb62a2768981b8d2237a
BLAKE2b-256 1ac0ac785b8970089549dcbc0d90f770a2cba73708cbe469f219ddbef3e31f5b

See more details on using hashes here.

File details

Details for the file markitai-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: markitai-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 192.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for markitai-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b1c7485032d19ddf67c4694534391f5fe60832e5aa2f404bbb9e5c2341a8b6e
MD5 42456bf1f683134a953db7072037b340
BLAKE2b-256 0cba74101c6d2c6d9f1293cd4008ff40bcd04da507eeac79cfcb80d4caa2b920

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page