A document converter for Word, PDF, Excel, and PowerPoint to Markdown.
Project description
Openize.MarkItDown for Python
Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing.
Features
- Convert
.docx
,.pdf
,.xlsx
, and.pptx
to Markdown. - Save Markdown files locally or send them to an LLM for processing.
- Structured with the Factory & Strategy Pattern for scalability.
- Works with Windows and Linux-compatible paths.
- Command-line interface for easy use.
Requirements
This package depends on the Aspose libraries, which are commercial products:
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
Installation
From TestPyPI
pip install openize-markitdown-python
Usage
Command Line Interface
# Convert a file and save locally
markitdown document.docx -o output_folder
# Process with an LLM (requires OPENAI_API_KEY environment variable)
markitdown document.docx -o output_folder --insert_into_llm
Python API
from openize.markitdown.core import MarkItDown
# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"
# Create MarkItDown instance
converter = MarkItDown(output_dir)
# Convert document and send output to LLM
converter.convert_document(input_file, insert_into_llm=True)
print("Conversion completed and data sent to LLM.")
Environment Variables
ASPOSE_LICENSE_PATH
: Required when using the Aspose Paid APIs. This should be set to the full path of your Aspose license file.OPENAI_API_KEY
: Required when using theinsert_into_llm=True
option or the--llm
flag.OPENAI_MODEL
: Specifies the OpenAI model name (default:gpt-4
).
To set these variables:
For Unix-based systems:
export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4"
For Windows (PowerShell):
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-api-key"
$env:OPENAI_MODEL = "gpt-4"
License
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.
⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openize_markitdown_python-25.4.0.tar.gz
.
File metadata
- Download URL: openize_markitdown_python-25.4.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caee04689d65bdb876cea43e8e96fd88e753be7738461d00f011692b93cfe830 |
|
MD5 | f1d664bec1d254e380131c523620adcd |
|
BLAKE2b-256 | 08f3ffd7561652b1dc2bf5645ff95a6ed60a5c8248e14eb998aa7a894fe5b9aa |
File details
Details for the file openize_markitdown_python-25.4.0-py3-none-any.whl
.
File metadata
- Download URL: openize_markitdown_python-25.4.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a76ccb1a23711bf01166807ca9a7db07d2ae97e5ac8d4918cca8680532a39de |
|
MD5 | 7416221994482168f8286acf983e7485 |
|
BLAKE2b-256 | 4088ed4097e7a92fe4c8ba1147aa9e49a1d327956923de609a3902ab1c4bd7fb |