Skip to main content

A document converter for Word, PDF, Excel, and PowerPoint to Markdown.

Project description

Openize.MarkItDown for Python

Python Version License Status

Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with popular LLMs for post-processing, including OpenAI, Claude, Gemini, and Mistral.

Features

  • Convert .docx, .pdf, .xlsx, and .pptx to Markdown.
  • Save Markdown files locally or send them to an LLM (OpenAI, Claude, Gemini, Mistral).
  • Structured with the Factory & Strategy Pattern for scalability.
  • Works with Windows and Linux-compatible paths.
  • Command-line interface for easy use.

Requirements

This package depends on the Aspose libraries, which are commercial products:

You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.

LLM integration may require the following additional packages or valid API credentials:

  • openai (for OpenAI)
  • anthropic (for Claude)
  • requests (used for Gemini and Mistral REST APIs)

Installation

pip install openize-markitdown-python

Usage

Command Line Interface

# Convert a file and save locally
markitdown document.docx -o output_folder

# Process with an LLM (requires appropriate API key)
markitdown document.docx -o output_folder --llm openai
markitdown document.docx -o output_folder --llm claude
markitdown document.docx -o output_folder --llm gemini
markitdown document.docx -o output_folder --llm mistral

Python API

from openize.markitdown.core import MarkItDown

input_file = "report.pdf"
output_dir = "output_markdown"

converter = MarkItDown(output_dir, llm_client_name="gemini")
converter.convert_document(input_file)

print("Conversion completed and data sent to Gemini.")

Environment Variables

The following environment variables are used to control license and LLM access:

Variable Description
ASPOSE_LICENSE_PATH Required to activate Aspose license (if using paid APIs)
OPENAI_API_KEY Required for OpenAI integration
OPENAI_MODEL (Optional) OpenAI model name (default: gpt-4)
CLAUDE_API_KEY Required for Claude integration
CLAUDE_MODEL (Optional) Claude model name (default: claude-v1)
GEMINI_API_KEY Required for Gemini integration
GEMINI_MODEL (Optional) Gemini model name (default: gemini-pro)
MISTRAL_API_KEY Required for Mistral integration
MISTRAL_MODEL (Optional) Mistral model name (default: mistral-medium)

Setting Environment Variables

Unix-based (Linux/macOS):

export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"

Windows PowerShell:

$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"

Running Tests

To run unit tests for Openize.MarkItDown, follow these steps:

1. Navigate to the package directory

From the root of the repository, change into the package directory:

cd openize-markitdown/packages/markitdown

2. Install test dependencies

Make sure pytest and pytest-mock are installed:

pip install pytest pytest-mock

3. Run tests using pytest

To run all tests:

pytest

To run a specific test file:

pytest tests/test.py

Tip

Use -v for more detailed test output:

pytest -v

License

This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.

⚠️ You must obtain valid licenses for Aspose libraries separately. This repository does not include or distribute any proprietary components.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openize_markitdown_python-25.6.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openize_markitdown_python-25.6.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file openize_markitdown_python-25.6.1.tar.gz.

File metadata

File hashes

Hashes for openize_markitdown_python-25.6.1.tar.gz
Algorithm Hash digest
SHA256 81fa1c842a3f802ac9fa270b13b192a76af4bcdb3bd4c1e51e04e6f8f4c2d02c
MD5 b44f08c0134c76dbd43e94cb5dc41295
BLAKE2b-256 b1e177d845a309adacaed628ccd77d8622cd1c347f2173eef9ab5f98a2dd6c5a

See more details on using hashes here.

File details

Details for the file openize_markitdown_python-25.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openize_markitdown_python-25.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d32fdd7e9b062e2dfe44de295e03617b98f1ec8970eeffa3f799fe4625e094e3
MD5 0c19e110e712de0937ea9562d37ca409
BLAKE2b-256 e46ed6e6bb645391d11331a8d50366cb4280906f9f35e34e358bdba1f14d7b95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page