Skip to main content

A document converter for Word, PDF, Excel, and PowerPoint to Markdown.

Project description

Openize.MarkItDown for Python

Python Version License Status

Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with popular LLMs for post-processing, including OpenAI, Claude, Gemini, and Mistral.

Features

  • Convert .docx, .pdf, .xlsx, and .pptx to Markdown.
  • Save Markdown files locally or send them to an LLM (OpenAI, Claude, Gemini, Mistral).
  • Structured with the Factory & Strategy Pattern for scalability.
  • Works with Windows and Linux-compatible paths.
  • Command-line interface for easy use.

Requirements

This package depends on the Aspose libraries, which are commercial products:

You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.

LLM integration may require the following additional packages or valid API credentials:

  • openai (for OpenAI)
  • anthropic (for Claude)
  • requests (used for Gemini and Mistral REST APIs)

Installation

pip install openize-markitdown-python

Usage

Command Line Interface

# Convert a file and save locally
markitdown document.docx -o output_folder

# Process with an LLM (requires appropriate API key)
markitdown document.docx -o output_folder --llm openai
markitdown document.docx -o output_folder --llm claude
markitdown document.docx -o output_folder --llm gemini
markitdown document.docx -o output_folder --llm mistral

Python API

from openize.markitdown.core import MarkItDown

input_file = "report.pdf"
output_dir = "output_markdown"

converter = MarkItDown(output_dir, llm_client_name="gemini")
converter.convert_document(input_file)

print("Conversion completed and data sent to Gemini.")

Environment Variables

The following environment variables are used to control license and LLM access:

Variable Description
ASPOSE_LICENSE_PATH Required to activate Aspose license (if using paid APIs)
OPENAI_API_KEY Required for OpenAI integration
OPENAI_MODEL (Optional) OpenAI model name (default: gpt-4)
CLAUDE_API_KEY Required for Claude integration
CLAUDE_MODEL (Optional) Claude model name (default: claude-v1)
GEMINI_API_KEY Required for Gemini integration
GEMINI_MODEL (Optional) Gemini model name (default: gemini-pro)
MISTRAL_API_KEY Required for Mistral integration
MISTRAL_MODEL (Optional) Mistral model name (default: mistral-medium)

Setting Environment Variables

Unix-based (Linux/macOS):

export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"

Windows PowerShell:

$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"

License

This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.

⚠️ You must obtain valid licenses for Aspose libraries separately. This repository does not include or distribute any proprietary components.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openize_markitdown_python-25.6.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

openize_markitdown_python-25.6.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file openize_markitdown_python-25.6.0.tar.gz.

File metadata

File hashes

Hashes for openize_markitdown_python-25.6.0.tar.gz
Algorithm Hash digest
SHA256 561495288428ab0e5560d021dff25e0e6636a333aa9f3d6b75f167c434b88834
MD5 7b8acb5553addf7f25a0f2570ce962d7
BLAKE2b-256 e03ec9948586441a164220ba55851fbe48a0159206810a46b17d60be32eebb98

See more details on using hashes here.

File details

Details for the file openize_markitdown_python-25.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for openize_markitdown_python-25.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7e0d20c076c4be6b02a549c2026aca0166e830fda015e3aa60ab7b58ed3b0c8
MD5 108dc2e85d3ab15e8c42b91a64ede2c7
BLAKE2b-256 82a42aaeb6746c47d38b34e15041df9da69d147b4cc98e0d4043d77e67acf587

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page