Convert PDF, Office, data, and markup files into clean, self-contained HTML — for humans and for LLMs.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

he_wei_gui

These details have not been verified by PyPI

Project description

everythingtohtml

Convert (almost) any file into clean, self-contained HTML — a universal file reader for your browser and scripts.

English | 中文发布文案 | ▶ Live demo — drag a file, read it as HTML

everythingtohtml is the spiritual inverse of tools like markitdown: instead of flattening rich documents down to Markdown, it lifts a wide range of formats up into clean, styled, standalone HTML you can open in a browser, embed in a page, or feed to a workflow that wants structured markup.

One small API. One CLI. A pluggable converter registry. No browser, no network required for local files.

中文简介：everythingtohtml 是一个浏览器里的万能文件阅读器，也是一个 Python 包和 CLI。它可以把 PDF、Office、Markdown、CSV、JSON、EPUB 等常见文件转换成干净、自包含的 HTML，方便直接阅读、分享和自动化处理。

from everythingtohtml import EverythingToHtml

eth = EverythingToHtml()
result = eth.convert("quarterly-report.docx")
print(result.html)        # a complete <!DOCTYPE html> document
print(result.title)       # best-effort document title

$ everythingtohtml notes.md -o notes.html
$ everythingtohtml data.csv > data.html
$ everythingtohtml https://example.com/feed.rss > feed.html

Why HTML (and not Markdown)?

Markdown is lossy: tables get flattened, styling vanishes, slide structure disappears, and nested data becomes ambiguous. HTML keeps the structure that matters — headings, tables, lists, sections, links, images — while staying:

Human-friendly — open the output in any browser, no toolchain needed.
Restyleable — every document ships with a small, overridable stylesheet.
Structure-preserving — explicit <table>/<section> markup keeps tables, sections, and nested content easy to inspect and process.
Self-contained — one file, valid HTML5, dark-mode aware.

Supported formats

Format	Extensions	Extra needed
Plain text	`.txt`, anything textual	— (built in)
Markdown	`.md`, `.markdown`, `.mkd`	— (built in)
HTML (clean/normalize)	`.html`, `.htm`, `.xhtml`	— (built in)
CSV / TSV	`.csv`, `.tsv`	— (built in)
JSON / JSONL	`.json`, `.jsonl`, `.ndjson`	— (built in)
Jupyter notebook	`.ipynb`	— (built in)
RSS / Atom feeds	`.rss`, `.atom`	— (built in)
EPUB e-books	`.epub`	— (built in)
Email	`.eml`	— (built in)
OpenDocument Text	`.odt`	— (built in)
YAML	`.yaml`, `.yml`	`pip install everythingtohtml[yaml]`
reStructuredText	`.rst`	`pip install everythingtohtml[rst]`
Word	`.docx`	`pip install everythingtohtml[docx]`
Word (legacy)	`.doc`	`pip install everythingtohtml[doc]` (LibreOffice recommended)
Excel	`.xlsx`, `.xlsm`	`pip install everythingtohtml[xlsx]`
PowerPoint	`.pptx`	`pip install everythingtohtml[pptx]`
PDF	`.pdf`	`pip install everythingtohtml[pdf]`

Legacy .doc: best results come from having LibreOffice installed (used headlessly for high-fidelity conversion). Without it, a pure-Python olefile fallback recovers the text content.

Want everything? pip install everythingtohtml[all]

New formats are just a small class away — see Writing a converter.

Installation

# core formats only (tiny dependency footprint)
pip install everythingtohtml

# pull in Office + data formats
pip install "everythingtohtml[all]"

# or cherry-pick
pip install "everythingtohtml[docx,xlsx]"

Requires Python 3.10+.

Usage

Library

from everythingtohtml import EverythingToHtml

eth = EverythingToHtml()

# From a path
result = eth.convert("slides.pptx")

# From bytes or an open stream
with open("data.csv", "rb") as f:
    result = eth.convert(f)

# From a URL (http/https/file/data URIs)
result = eth.convert("https://example.com/posts.atom")

# Give hints when the source is ambiguous (e.g. stdin)
from everythingtohtml import StreamInfo
result = eth.convert(raw_bytes, stream_info=StreamInfo(extension=".md"))

result.html          # the full HTML document (str)
result.title         # detected title, or None
result.text_content  # alias for .html (drop-in for markdown-style code)

Command line

everythingtohtml SOURCE [-o OUTPUT] [--extension .md] [--mimetype text/markdown]

# convert a file to a file
everythingtohtml report.docx -o report.html

# pipe through stdin (give it a hint)
cat notes.md | everythingtohtml --extension .md > notes.html

# fetch and convert a remote feed
everythingtohtml https://hnrss.org/frontpage > hn.html

The CLI is also available as e2h for the impatient.

Merging and comparing documents

Need to collate a stack of Word files into one page, or see exactly what changed between two revisions? everythingtohtml does both — for any supported format.

eth = EverythingToHtml()

# Merge several documents into one HTML page (each becomes a section, with a TOC)
merged = eth.merge(["intro.docx", "chapter1.doc", "appendix.pdf"])

# Place them side by side for visual comparison
columns = eth.merge(["draft-v1.docx", "draft-v2.docx"], layout="columns")

# Produce a highlighted, line-by-line diff of two documents' text
changes = eth.diff("spec-old.docx", "spec-new.docx")
open("changes.html", "w", encoding="utf-8").write(changes.html)

From the CLI:

# two or more sources are merged automatically
everythingtohtml intro.docx chapter1.doc appendix.pdf -o handbook.html

# side-by-side layout
everythingtohtml old.docx new.docx --columns -o compare.html

# highlighted diff of exactly two documents
everythingtohtml spec-old.docx spec-new.docx --diff -o changes.html

Architecture

everythingtohtml borrows the proven shape of markitdown:

EverythingToHtml            # engine: detection + dispatch + plugins
 ├─ StreamInfo              # immutable bag of hints (ext, mime, charset, …)
 ├─ DocumentConverter       # base class: accepts() + convert()
 │   ├─ MarkdownConverter
 │   ├─ CsvConverter
 │   ├─ DocxConverter (mammoth)
 │   └─ … one small class per format
 └─ DocumentConverterResult # { html, title, metadata }

When you call convert(), the engine:

Detects the stream — extension, mimetype, declared charset, and magic-byte sniffing via puremagic fill in a StreamInfo.
Dispatches — converters are tried in priority order; each accepts() is a cheap, non-destructive check. Specific formats win over the plain-text catch-all.
Converts — the winning converter returns a DocumentConverterResult. If a converter accepts but raises, the engine records it and tries the next one, so one greedy converter can't sink the whole conversion.

Writing a converter

from everythingtohtml import DocumentConverter, DocumentConverterResult, StreamInfo
from everythingtohtml._html_builder import wrap_document, escape_text

class UpperTextConverter(DocumentConverter):
    def accepts(self, file_stream, stream_info: StreamInfo, **kwargs) -> bool:
        return stream_info.normalized_extension() == ".loud"

    def convert(self, file_stream, stream_info: StreamInfo, **kwargs):
        text = file_stream.read().decode("utf-8").upper()
        return DocumentConverterResult(wrap_document(f"<pre>{escape_text(text)}</pre>"))

eth = EverythingToHtml()
eth.register_converter(UpperTextConverter())

Ship it as a package and expose it as a plugin via entry points so any user can EverythingToHtml(enable_plugins=True) and pick it up automatically — see docs/PLUGINS.md.

Contributing

Contributions are very welcome — new converters especially. See CONTRIBUTING.md and our Code of Conduct. Found a security issue? See SECURITY.md.

Acknowledgements

The converter-registry design is directly inspired by Microsoft's excellent markitdown. everythingtohtml aims to be its mirror image for teams that want structure-preserving HTML instead of Markdown.

License

MIT © everythingtohtml contributors

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

he_wei_gui

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

everythingtohtml-0.1.2.tar.gz (35.4 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

everythingtohtml-0.1.2-py3-none-any.whl (51.9 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file everythingtohtml-0.1.2.tar.gz.

File metadata

Download URL: everythingtohtml-0.1.2.tar.gz
Upload date: Jun 9, 2026
Size: 35.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for everythingtohtml-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`f99ce892fe972f7da1d169f06d76d648fae3e7856731b3ed28bc51e59dfce8cb`
MD5	`c1ab238e4bed3a8c24fa22f5e2a0ee0e`
BLAKE2b-256	`694f145089102d9e79b89a7b5bd2cc60d25f67badfa20d716520c6c10e92ddaf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for everythingtohtml-0.1.2.tar.gz:

Publisher: release.yml on He-wei-gui/everythingtohtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: everythingtohtml-0.1.2.tar.gz
- Subject digest: f99ce892fe972f7da1d169f06d76d648fae3e7856731b3ed28bc51e59dfce8cb
- Sigstore transparency entry: 1765954593
- Sigstore integration time: Jun 9, 2026
Source repository:
- Permalink: He-wei-gui/everythingtohtml@d205bacf4ddbc36a2fc41f24b6933cd082b07e2f
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/He-wei-gui
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d205bacf4ddbc36a2fc41f24b6933cd082b07e2f
- Trigger Event: push

File details

Details for the file everythingtohtml-0.1.2-py3-none-any.whl.

File metadata

Download URL: everythingtohtml-0.1.2-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 51.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for everythingtohtml-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f263baabf35ca06badad1672ace88f498e226f3d6924103901d061686254556c`
MD5	`b51e101ff30f46bdc84a6e65a20e22e8`
BLAKE2b-256	`deda79f8d0858c062862b99c5eec81590729861fbfe860597a51aaac49df545d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for everythingtohtml-0.1.2-py3-none-any.whl:

Publisher: release.yml on He-wei-gui/everythingtohtml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: everythingtohtml-0.1.2-py3-none-any.whl
- Subject digest: f263baabf35ca06badad1672ace88f498e226f3d6924103901d061686254556c
- Sigstore transparency entry: 1765954907
- Sigstore integration time: Jun 9, 2026
Source repository:
- Permalink: He-wei-gui/everythingtohtml@d205bacf4ddbc36a2fc41f24b6933cd082b07e2f
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/He-wei-gui
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d205bacf4ddbc36a2fc41f24b6933cd082b07e2f
- Trigger Event: push

everythingtohtml 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

everythingtohtml

Why HTML (and not Markdown)?

Supported formats

Installation

Usage

Library

Command line

Merging and comparing documents

Architecture

Writing a converter

Contributing

Acknowledgements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance