A document converter for Word, PDF, Excel, and PowerPoint to Markdown.
Project description
Openize.MarkItDown for Python
Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with popular LLMs for post-processing, including OpenAI, Claude, Gemini, and Mistral.
Features
- Convert
.docx
,.pdf
,.xlsx
, and.pptx
to Markdown. - Save Markdown files locally or send them to an LLM (OpenAI, Claude, Gemini, Mistral).
- Structured with the Factory & Strategy Pattern for scalability.
- Works with Windows and Linux-compatible paths.
- Command-line interface for easy use.
Requirements
This package depends on the Aspose libraries, which are commercial products:
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
LLM integration may require the following additional packages or valid API credentials:
openai
(for OpenAI)anthropic
(for Claude)requests
(used for Gemini and Mistral REST APIs)
Installation
pip install openize-markitdown-python
Usage
Command Line Interface
# Convert a file and save locally
markitdown document.docx -o output_folder
# Process with an LLM (requires appropriate API key)
markitdown document.docx -o output_folder --llm openai
markitdown document.docx -o output_folder --llm claude
markitdown document.docx -o output_folder --llm gemini
markitdown document.docx -o output_folder --llm mistral
Python API
from openize.markitdown.core import MarkItDown
input_file = "report.pdf"
output_dir = "output_markdown"
converter = MarkItDown(output_dir, llm_client_name="gemini")
converter.convert_document(input_file)
print("Conversion completed and data sent to Gemini.")
Environment Variables
The following environment variables are used to control license and LLM access:
Variable | Description |
---|---|
ASPOSE_LICENSE_PATH |
Required to activate Aspose license (if using paid APIs) |
OPENAI_API_KEY |
Required for OpenAI integration |
OPENAI_MODEL |
(Optional) OpenAI model name (default: gpt-4 ) |
CLAUDE_API_KEY |
Required for Claude integration |
CLAUDE_MODEL |
(Optional) Claude model name (default: claude-v1 ) |
GEMINI_API_KEY |
Required for Gemini integration |
GEMINI_MODEL |
(Optional) Gemini model name (default: gemini-pro ) |
MISTRAL_API_KEY |
Required for Mistral integration |
MISTRAL_MODEL |
(Optional) Mistral model name (default: mistral-medium ) |
Setting Environment Variables
Unix-based (Linux/macOS):
export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"
Windows PowerShell:
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"
License
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.
⚠️ You must obtain valid licenses for Aspose libraries separately. This repository does not include or distribute any proprietary components.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openize_markitdown_python-25.6.0.tar.gz
.
File metadata
- Download URL: openize_markitdown_python-25.6.0.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
561495288428ab0e5560d021dff25e0e6636a333aa9f3d6b75f167c434b88834
|
|
MD5 |
7b8acb5553addf7f25a0f2570ce962d7
|
|
BLAKE2b-256 |
e03ec9948586441a164220ba55851fbe48a0159206810a46b17d60be32eebb98
|
File details
Details for the file openize_markitdown_python-25.6.0-py3-none-any.whl
.
File metadata
- Download URL: openize_markitdown_python-25.6.0-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a7e0d20c076c4be6b02a549c2026aca0166e830fda015e3aa60ab7b58ed3b0c8
|
|
MD5 |
108dc2e85d3ab15e8c42b91a64ede2c7
|
|
BLAKE2b-256 |
82a42aaeb6746c47d38b34e15041df9da69d147b4cc98e0d4043d77e67acf587
|