Skip to main content

Document to Markdown converter with LLM enhancement

Project description

Markitai

Opinionated Markdown converter with native LLM enhancement support.

Features

  • Multi-format Support - DOCX/DOC, PPTX/PPT, XLSX/XLS, PDF, TXT, MD, JPG/PNG/WebP, URLs
  • LLM Enhancement - Format cleaning, metadata generation, image analysis
  • Batch Processing - Concurrent conversion, resume capability, progress display
  • OCR Recognition - Text extraction from scanned PDFs and images
  • URL Conversion - Direct webpage conversion with SPA browser rendering support
  • Smart Caching - LLM result caching, SPA domain learning, auto-proxy detection

Installation

One-Click Setup (Recommended)

# Linux/macOS
curl -fsSL https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/Ynewtime/markitai/main/scripts/setup.ps1 | iex

Manual Installation

# Requires Python 3.11+
uv tool install markitai

# Or using pip
pip install --user markitai

Quick Start

# Basic conversion
markitai document.docx

# URL conversion
markitai https://example.com/article

# LLM enhancement
markitai document.docx --llm

# Using presets
markitai document.pdf --preset rich      # LLM + alt + desc + screenshot
markitai document.pdf --preset standard  # LLM + alt + desc
markitai document.pdf --preset minimal   # Basic conversion only

# Batch processing
markitai ./docs -o ./output

# Resume interrupted job
markitai ./docs -o ./output --resume

# Batch URL processing (auto-detect .urls files)
markitai urls.urls -o ./output

Output Structure

output/
├── document.docx.md        # Basic Markdown
├── document.docx.llm.md    # LLM-enhanced version
├── assets/
│   ├── document.docx.0001.jpg
│   └── images.json         # Image descriptions
├── screenshots/            # Page screenshots (with --screenshot)
│   └── example_com.full.jpg

Configuration

Priority: CLI arguments > Environment variables > Config file > Defaults

# View configuration
markitai config list

# Initialize config file
markitai config init -o .

# View cache status
markitai cache stats

# Clear cache
markitai cache clear

Config file location: ./markitai.json or ~/.markitai/config.json

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API Key
GEMINI_API_KEY Google Gemini API Key
DEEPSEEK_API_KEY DeepSeek API Key
ANTHROPIC_API_KEY Anthropic API Key
JINA_API_KEY Jina Reader API Key (URL conversion)

Dependencies

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitai-0.3.1.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markitai-0.3.1-py3-none-any.whl (206.9 kB view details)

Uploaded Python 3

File details

Details for the file markitai-0.3.1.tar.gz.

File metadata

  • Download URL: markitai-0.3.1.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markitai-0.3.1.tar.gz
Algorithm Hash digest
SHA256 6c80dfa4e3ba5653aa0847affef1a4c31f41ee6fe13e823b7d76ca33cc09b7a4
MD5 13c94dd1280b1713dbaeb35f9962e1ad
BLAKE2b-256 160c73c406fe32e155139d538de0d935727e7a0e7758bb368d1d2a75ec24b77f

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitai-0.3.1.tar.gz:

Publisher: publish.yml on Ynewtime/markitai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markitai-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: markitai-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 206.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markitai-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e495e5bc302d29b658b66a265d8f3f55692182bbb6296448d8e6f516439bccea
MD5 ecdca6f145330369f4e7d42ceb73d8f0
BLAKE2b-256 d0e50b545975d23807d28e4d874ba2dbe8e4e20f54c177b621eace73c66d9124

See more details on using hashes here.

Provenance

The following attestation bundles were made for markitai-0.3.1-py3-none-any.whl:

Publisher: publish.yml on Ynewtime/markitai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page