Skip to main content

A document converter for Word, PDF, Excel, and PowerPoint to Markdown.

Project description

Openize.MarkItDown for Python

Python Version License Status

Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing.

Features

  • Convert .docx, .pdf, .xlsx, and .pptx to Markdown.
  • Save Markdown files locally or send them to an LLM for processing.
  • Structured with the Factory & Strategy Pattern for scalability.
  • Works with Windows and Linux-compatible paths.
  • Command-line interface for easy use.

Requirements

This package depends on the Aspose libraries, which are commercial products:

You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.

Installation

From TestPyPI

pip install openize-markitdown-python

Usage

Command Line Interface

# Convert a file and save locally
markitdown document.docx -o output_folder

# Process with an LLM (requires OPENAI_API_KEY environment variable)
markitdown document.docx -o output_folder --insert_into_llm

Python API

from openize.markitdown.core import MarkItDown

# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"

# Create MarkItDown instance
converter = MarkItDown(output_dir)

# Convert document and send output to LLM
converter.convert_document(input_file, insert_into_llm=True)

print("Conversion completed and data sent to LLM.")

Environment Variables

  • ASPOSE_LICENSE_PATH: Required when using the Aspose Paid APIs. This should be set to the full path of your Aspose license file.
  • OPENAI_API_KEY: Required when using the insert_into_llm=True option or the --llm flag.
  • OPENAI_MODEL: Specifies the OpenAI model name (default: gpt-4).

To set these variables:

For Unix-based systems:

export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4"

For Windows (PowerShell):

$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-api-key"
$env:OPENAI_MODEL = "gpt-4"

License

This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.

⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openize_markitdown_python-25.4.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file openize_markitdown_python-25.4.0.tar.gz.

File metadata

File hashes

Hashes for openize_markitdown_python-25.4.0.tar.gz
Algorithm Hash digest
SHA256 caee04689d65bdb876cea43e8e96fd88e753be7738461d00f011692b93cfe830
MD5 f1d664bec1d254e380131c523620adcd
BLAKE2b-256 08f3ffd7561652b1dc2bf5645ff95a6ed60a5c8248e14eb998aa7a894fe5b9aa

See more details on using hashes here.

File details

Details for the file openize_markitdown_python-25.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for openize_markitdown_python-25.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a76ccb1a23711bf01166807ca9a7db07d2ae97e5ac8d4918cca8680532a39de
MD5 7416221994482168f8286acf983e7485
BLAKE2b-256 4088ed4097e7a92fe4c8ba1147aa9e49a1d327956923de609a3902ab1c4bd7fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page