Skip to main content

Convert documents to semantic HTML optimized for LLM context - reduces token congestion

Project description

MakeContextSimple

Convert documents to semantic HTML optimized for LLM context consumption.

Overview

MakeContextSimple is a Python utility that converts various document formats into clean, semantic HTML optimized for large language model (LLM) consumption. Unlike Markdown-based converters, MakeContextSimple produces HTML that is:

  • Token-efficient: Less syntax overhead than Markdown for complex structures
  • Semantically rich: HTML tags convey meaning without extra markers
  • Machine-parseable: Standard HTML parsers work reliably
  • Browser-viewable: Output can be directly viewed in any browser

Supported Formats

Category Formats
Documents PDF, DOCX, Markdown
Office PPTX, XLSX
Web HTML, XML, RSS
Data CSV, JSON
Text Plain text, Code files, Config files
Images JPG, PNG, GIF, WebP, BMP

Installation

Basic Installation

pip install makecontextsimple

With Optional Dependencies

# For PDF support
pip install makecontextsimple[pdf]

# For Office document support
pip install makecontextsimple[docx,pptx,xlsx]

# For image support
pip install makecontextsimple[image]

# For all formats
pip install makecontextsimple[all]

From Source

git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[all]"

Usage

Command Line

# Convert a file to HTML (output to stdout)
makecontextsimple document.pdf

# Convert with custom output file
makecontextsimple document.pdf -o output.html

# Generate minimal HTML for LLM context
makecontextsimple document.pdf --llm

# List supported formats
makecontextsimple --list-formats

Python API

from makecontextsimple import MakeContextSimple

# Initialize converter
converter = MakeContextSimple()

# Convert a file
result = converter.convert("document.pdf")

# Get full HTML document
html = result.to_full_document()
print(html)

# Get minimal HTML for LLM context
llm_context = result.to_llm_context()

# Save directly to file
converter.convert_to_file("document.pdf", "output.html")

# Convert URL content
import requests
response = requests.get("https://example.com/page.html")
result = converter.convert(response)

Custom Styles

# Use custom CSS
custom_css = """
body { font-family: Arial; max-width: 800px; margin: 0 auto; }
h1 { color: #333; }
"""
result = converter.convert("document.pdf")
html = result.to_full_document(styles=custom_css)

Custom Converters

from makecontextsimple import HTMLConverter, HTMLResult

class MyCustomConverter(HTMLConverter):
    def accepts(self, file_stream, mimetype=None, extension=None, **kwargs):
        return extension == ".myformat"
    
    def convert(self, file_stream, mimetype=None, extension=None, **kwargs):
        content = file_stream.read().decode("utf-8")
        # Custom conversion logic
        html = f"<pre>{content}</pre>"
        return HTMLResult(html=html, title="Custom Format")

# Register custom converter
converter = MakeContextSimple()
converter.register_converter(MyCustomConverter(), priority=0)

Architecture

MakeContextSimple follows a plugin-based converter architecture:

MakeContextSimple (orchestrator)
    ├── HTMLConverter (abstract base)
    │   ├── PDFConverter
    │   ├── DOCXConverter
    │   ├── PPTXConverter
    │   ├── XLSXConverter
    │   ├── ImageConverter
    │   ├── CSVConverter
    │   ├── JSONConverter
    │   ├── XMLConverter
    │   ├── HTMLConverter_Builtin
    │   ├── MarkdownConverter
    │   └── PlainTextConverter
    ├── HTMLBuilder (utilities)
    └── HTMLResult (output container)

Key Components

  • MakeContextSimple: Main orchestrator that manages converters and I/O
  • HTMLConverter: Abstract base class for all format converters
  • HTMLBuilder: Utility class for constructing semantic HTML
  • HTMLResult: Container for conversion output with metadata

Why HTML Over Markdown?

Aspect Markdown HTML
Token Efficiency Good Better (15-20% fewer)
Table Syntax |---| separators <table> tags
Semantic Meaning Relies on conventions Explicit tags
Parsing Regex/string ops Standard parsers
Preview Needs rendering Native browser

Token Comparison Example

Markdown (180 tokens):

| Name  | Age | City     |
|-------|-----|----------|
| Alice | 30  | New York |

HTML (150 tokens):

<table>
<tr><td>Name</td><td>Age</td><td>City</td></tr>
<tr><td>Alice</td><td>30</td><td>New York</td></tr>

Plugin System

MakeContextSimple supports third-party plugins via Python's entry_points:

# In your plugin's pyproject.toml:
[project.entry-points."makecontextsimple.plugin"]
my_plugin = "my_package:register"

# In your plugin:
def register(converter_instance):
    converter_instance.register_converter(MyConverter(), priority=5)

Development

Setup

git clone https://github.com/makecontextsimple/makecontextsimple.git
cd makecontextsimple
pip install -e ".[dev]"

Running Tests

pytest tests/

Code Style

ruff check src/
ruff format src/

License

MIT License

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

makecontextsimple-0.1.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

makecontextsimple-0.1.0-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file makecontextsimple-0.1.0.tar.gz.

File metadata

  • Download URL: makecontextsimple-0.1.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for makecontextsimple-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8026c9d11eb6dc8db0ee7fbce32fa40ac4d9b092269e4e91b63aa5bf4d761a4b
MD5 1960b0dd2f3e27c6e5ab245c9004c928
BLAKE2b-256 93fdd3db9248de86e112c3231d2a7230585770e038f366e9d928b5a8a45ef6c3

See more details on using hashes here.

File details

Details for the file makecontextsimple-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for makecontextsimple-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc0e96121ef1dfd3c3d1452c0b27235fe7e785a7e31b7dc7201bd22108d4b9db
MD5 d9174fcfb27167d1980c108e91f1a1c2
BLAKE2b-256 eac4e3afdcbda8bd33f1926e55be0730cf0c70c9fc5c11c5ea7a67a382993a42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page