Skip to main content

Opinionated Markdown converter with native LLM enhancement support

Project description

Markitai

English | 简体中文

Opinionated Markdown converter with native LLM enhancement support.

Features

  • Multi-format Support - DOCX/DOC, PPTX/PPT, XLSX/XLS, PDF, HTML, EPUB, CSV, TXT, MD, JPG/PNG/WebP/GIF/BMP/TIFF, URLs, and 10+ more via optional converters
  • LLM Enhancement - Format cleaning, metadata generation, image analysis
  • Local Providers - Use existing Claude Code, GitHub Copilot, ChatGPT, or Gemini CLI subscriptions — no API keys needed
  • Batch Processing - Concurrent conversion, resume capability, progress display
  • OCR Recognition - Text extraction from scanned PDFs and images
  • URL Conversion - Smart strategy chain (Defuddle → Jina → Static → Playwright → Cloudflare) with SPA auto-detection
  • Cloudflare Integration - Cloud-based URL rendering (Browser Rendering) and file conversion (Workers AI toMarkdown) via --cloudflare
  • Smart Caching - LLM result caching, SPA domain learning, auto-proxy detection
  • Fetch Security - Configurable strategy priority, domain/IP exemption with NO_PROXY support for information security compliance

Installation

One-Click Setup (Recommended)

# Linux/macOS
curl -fsSL https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.ps1 | iex"

Manual Installation

# Requires Python 3.11-3.13 (3.14 not yet supported)
uv tool install markitai

# Or using uv pip (for virtual environment)
uv pip install markitai

Quick Start

First Run

# Interactive mode (recommended for new users)
markitai -I

# Or convert a file directly
markitai document.pdf

# With LLM enhancement
markitai document.pdf --llm

Check Setup

# Verify all dependencies
markitai doctor

# Auto-fix missing components
markitai doctor --fix

Common Tasks

# Basic conversion
markitai document.docx

# URL conversion
markitai https://example.com/article

# LLM enhancement
markitai document.docx --llm

# Using presets
markitai document.pdf --preset rich      # LLM + alt + desc + screenshot
markitai document.pdf --preset standard  # LLM + alt + desc
markitai document.pdf --preset minimal   # Basic conversion only

# Cloudflare cloud rendering
markitai https://example.com --cloudflare

# Batch processing
markitai ./docs -o ./output

# Resume interrupted job
markitai ./docs -o ./output --resume

# Batch URL processing (auto-detect .urls files)
markitai urls.urls -o ./output

Output Structure

output/
├── document.docx.md            # Basic Markdown (skipped in --llm mode unless --keep-base)
├── document.docx.llm.md        # LLM-enhanced version (when --llm is used)
├── .markitai/                   # Metadata namespace (isolated from user content)
│   ├── assets/
│   │   ├── document.docx.0001.jpg
│   │   └── images.json         # Image descriptions
│   ├── screenshots/            # Page screenshots (with --screenshot)
│   │   └── example_com.full.jpg
│   ├── reports/                # Conversion reports (JSON)
│   └── states/                 # Batch state files (for --resume)

In --llm mode, only .llm.md is written by default. Use --keep-base to also write the base .md.

Configuration

Priority: CLI arguments > Environment variables > Config file > Defaults

# View configuration
markitai config list

# Initialize config file
markitai init

# View cache status
markitai cache stats

# Clear cache
markitai cache clear

# Check system health and dependencies
markitai doctor

Config file location: ./markitai.json or ~/.markitai/config.json

Local Providers (Subscription-based)

Use your existing subscriptions — no API keys needed:

# Claude Agent (requires Claude Code CLI)
markitai document.pdf --llm  # Configure claude-agent/sonnet in config

# GitHub Copilot (requires Copilot CLI)
markitai document.pdf --llm  # Configure copilot/gpt-5.2 in config

# ChatGPT (OAuth Device Code — no SDK needed)
markitai auth login chatgpt  # One-time browser login
markitai document.pdf --llm  # Configure chatgpt/gpt-5.2 in config

# Gemini CLI (reuses ~/.gemini/oauth_creds.json)
markitai document.pdf --llm  # Configure gemini-cli/gemini-2.5-pro in config

Install CLI tools (for claude-agent / copilot):

# Claude Code CLI
curl -fsSL https://claude.ai/install.sh | bash

# GitHub Copilot CLI
curl -fsSL https://gh.io/copilot-install | bash

Check provider authentication status:

markitai auth status

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API Key
GEMINI_API_KEY Google Gemini API Key
DEEPSEEK_API_KEY DeepSeek API Key
ANTHROPIC_API_KEY Anthropic API Key
JINA_API_KEY Jina Reader API Key (URL conversion)
CLOUDFLARE_API_TOKEN Cloudflare API Token (Browser Rendering / Workers AI)
CLOUDFLARE_ACCOUNT_ID Cloudflare Account ID

Dependencies

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitai-0.11.0.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markitai-0.11.0-py3-none-any.whl (401.6 kB view details)

Uploaded Python 3

File details

Details for the file markitai-0.11.0.tar.gz.

File metadata

  • Download URL: markitai-0.11.0.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markitai-0.11.0.tar.gz
Algorithm Hash digest
SHA256 6d46bb51cbadb1827afda381c1e49af5d8927a3a233974cf229596d9c4e39417
MD5 0e1e64590066584f775a833a4294020d
BLAKE2b-256 9435d67b6d16c6cca2ca1b6b103fdc549633c9f9e3b5c192460ae18f44aac7c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitai-0.11.0.tar.gz:

Publisher: publish.yml on Ynewtime/markitai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markitai-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: markitai-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 401.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markitai-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2202ecd9f0a00a805a070e1911b3d26ecfd113baa19c5de3eeffe6c72ee0d137
MD5 c26c96417b00f093bcd68881b279f06e
BLAKE2b-256 4c8b4fca2b3ae9845a50953d3de5064da21595625e7684d41e8f9191369a77b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitai-0.11.0-py3-none-any.whl:

Publisher: publish.yml on Ynewtime/markitai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page