Skip to main content

Document conversion library — Markdown, HTML, CSV, and JSON transformations with zero heavyweight dependencies.

Project description

peasy-document

PyPI Python License: MIT

Pure Python document conversion library with 10 conversion functions across 6 formats — Markdown, HTML, CSV, JSON, YAML, and plain text. Convert between document formats with frozen dataclass results, full conversion metadata, and only one lightweight dependency (markdown). Handles strings, bytes, and file paths uniformly through a single TextInput type, so you never have to worry about I/O boilerplate.

Built from the document conversion engine behind PeasyDocument, which provides interactive browser-based tools for Markdown to HTML conversion, CSV to JSON transformation, and HTML to Markdown extraction. The library covers 10 conversion paths with sub-millisecond performance for typical documents.

Try the interactive tools at peasydocument.comMarkdown to HTML, CSV to JSON, HTML to Text, JSON to YAML, CSV to Markdown Table

peasy-document demo — Markdown to HTML, CSV to JSON conversion in Python REPL

Table of Contents

Install

# Core library (only markdown dependency)
pip install peasy-document

# With CLI support
pip install "peasy-document[cli]"

# Everything
pip install "peasy-document[all]"

Quick Start

from peasy_document import markdown_to_html, csv_to_json, html_to_text

# Convert Markdown to HTML with tables, code highlighting, and TOC support
result = markdown_to_html("# Hello World\n\nThis is **bold** text.")
print(result.content)
# <h1>Hello World</h1>
# <p>This is <strong>bold</strong> text.</p>

# Convert CSV data to JSON array of objects
result = csv_to_json("name,age\nAlice,30\nBob,25")
print(result.content)
# [{"name": "Alice", "age": "30"}, {"name": "Bob", "age": "25"}]

# Strip HTML to plain text — removes all tags and decodes entities
result = html_to_text("<h1>Title</h1><p>Hello &amp; welcome.</p>")
print(result.content)
# Title
# Hello & welcome.

All functions return frozen dataclasses with conversion metadata — source format, target format, and byte sizes before and after conversion:

# Every ConversionResult carries metadata about the transformation
result = markdown_to_html("# Hello")
print(result.source_format)  # "markdown"
print(result.target_format)  # "html"
print(result.source_size)    # 7 (bytes of input)
print(result.target_size)    # 18 (bytes of output)

What You Can Do

Markdown to HTML Conversion

Markdown is the de facto standard for developer documentation, README files, and technical writing. Defined by the CommonMark specification, Markdown provides a lightweight syntax that maps cleanly to HTML. peasy-document uses the battle-tested Python-Markdown library under the hood, with sensible defaults that cover the most common use cases out of the box.

Feature Extension Enabled by Default
Pipe tables tables Yes
Fenced code blocks fenced_code Yes
Syntax highlighting codehilite Yes
Table of contents toc Yes
Custom extensions Pass any Python-Markdown extension Via extensions= kwarg
from peasy_document import markdown_to_html

# Convert Markdown with default extensions: tables, fenced_code, codehilite, toc
result = markdown_to_html("""
# API Documentation

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET    | /users   | List users  |
| POST   | /users   | Create user |

```python
import requests
response = requests.get("/users")
\```
""")

# Override extensions for minimal output (tables + toc only)
result = markdown_to_html("content", extensions=["tables", "toc"])

Accepts str, bytes, or Path objects — read from files or process raw data without boilerplate:

from pathlib import Path

# Read Markdown from a file and convert to HTML
result = markdown_to_html(Path("README.md"))

# Process binary content from an HTTP response or database blob
result = markdown_to_html(b"# Binary input works too")

Learn more: Markdown to HTML Tool · Markdown Syntax Guide · CommonMark Specification

HTML Processing and Extraction

HTML is the backbone of the web, but extracting useful content from HTML documents often requires stripping tags, decoding entities (&amp; to &, &lt; to <), and ignoring non-content elements like <script> and <style> blocks. peasy-document provides two extraction paths: HTML to plain text for content indexing, and HTML to Markdown for content migration or CMS workflows.

Both functions use Python's stdlib html.parser — no external dependencies like BeautifulSoup or lxml required.

Conversion Use Case Tags Handled
HTML to Text Search indexing, content extraction, text analysis Strips all tags, decodes entities, ignores <script>/<style>
HTML to Markdown CMS migration, content republishing, documentation conversion p, h1-h6, a, strong/b, em/i, ul/ol/li, code, pre, br, img
Text to HTML Plain text formatting, email body generation Wraps paragraphs in <p>, converts single newlines to <br>
from peasy_document import html_to_text, html_to_markdown, text_to_html

# Strip HTML to plain text — useful for search indexing and content analysis
result = html_to_text("""
<html>
<head><title>Page</title></head>
<body>
  <h1>Welcome</h1>
  <p>This is a <strong>formatted</strong> document with &amp; entities.</p>
  <script>alert('ignored')</script>
</body>
</html>
""")
print(result.content)
# Welcome
# This is a formatted document with & entities.

# Convert HTML to Markdown — preserves links, emphasis, headings, and lists
result = html_to_markdown("""
<h1>Document Title</h1>
<p>Visit <a href="https://example.com">our site</a> for <strong>more info</strong>.</p>
<ul>
  <li>First item</li>
  <li>Second item</li>
</ul>
""")
print(result.content)
# # Document Title
# Visit [our site](https://example.com) for **more info**.
# - First item
# - Second item

# Convert plain text to HTML paragraphs — double newlines become <p> tags
result = text_to_html("First paragraph.\n\nSecond paragraph.\nWith a line break.")
print(result.content)
# <p>First paragraph.</p>
# <p>Second paragraph.<br>With a line break.</p>

Learn more: HTML to Text Tool · HTML to Markdown Tool · HTML Entity Reference

CSV and JSON Transformation

CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are the two most common data interchange formats. CSV, defined in RFC 4180, represents tabular data with rows and columns. JSON, specified in RFC 8259, represents structured data as nested objects and arrays. Converting between these two formats is one of the most frequent tasks in data processing pipelines, API integrations, and ETL workflows.

peasy-document handles both directions using Python's stdlib csv and json modules — no pandas or external data libraries required.

Direction Input Format Output Format Key Features
CSV to JSON RFC 4180 CSV with header row JSON array of objects Custom delimiters, header-keyed objects
JSON to CSV JSON array of objects CSV with auto-detected headers Handles inconsistent keys across objects
from peasy_document import csv_to_json, json_to_csv

# CSV to JSON — each row becomes a JSON object keyed by header values
result = csv_to_json("name,role,team\nAlice,Engineer,Backend\nBob,Designer,Frontend")
print(result.content)
# [
#   {"name": "Alice", "role": "Engineer", "team": "Backend"},
#   {"name": "Bob", "role": "Designer", "team": "Frontend"}
# ]

# Roundtrip: JSON back to CSV preserves column order
result = json_to_csv(result.content)
print(result.content)
# name,role,team
# Alice,Engineer,Backend
# Bob,Designer,Frontend

# Tab-separated values (TSV) — pass any single-character delimiter
result = csv_to_json("name\tage\nAlice\t30", delimiter="\t")

# Handles inconsistent keys gracefully — union of all keys becomes the header
result = json_to_csv('[{"a": 1, "b": 2}, {"b": 3, "c": 4}]')
# → a,b,c header with empty cells where keys are missing

Learn more: CSV to JSON Tool · JSON to CSV Tool · RFC 4180 CSV Standard

JSON to YAML Conversion

YAML (YAML Ain't Markup Language) is widely used for configuration files — Kubernetes manifests, Docker Compose files, CI/CD pipelines, and infrastructure-as-code tools all rely on YAML's human-readable format. Converting JSON to YAML is a common need when moving between API responses (JSON) and configuration files (YAML).

peasy-document implements JSON-to-YAML conversion with a recursive pure-Python renderer. No PyYAML dependency is required. The converter handles nested objects, arrays, strings, numbers, booleans, and null values. Special characters in strings are automatically quoted per the YAML 1.2 specification.

YAML Feature Supported Notes
Nested objects Yes Indented with 2 spaces
Arrays Yes Block sequence style (- item)
Strings with special chars Yes Auto-quoted (":", "#", brackets, etc.)
Reserved words Yes true, false, null, yes, no are quoted when used as strings
Numbers and booleans Yes Rendered without quotes
from peasy_document import json_to_yaml

# Convert a JSON config object to YAML — handles nested structures
result = json_to_yaml('{"server": {"host": "localhost", "port": 8080}, "debug": true}')
print(result.content)
# server:
#   host: localhost
#   port: 8080
# debug: true

# Arrays render as YAML block sequences
result = json_to_yaml('{"tags": ["python", "yaml", "json"], "count": 3}')
print(result.content)
# tags:
#   - python
#   - yaml
#   - json
# count: 3

# Special characters in values are auto-quoted for YAML safety
result = json_to_yaml('{"url": "https://example.com:8080/path#section"}')
print(result.content)
# url: "https://example.com:8080/path#section"

Learn more: JSON to YAML Tool · YAML vs JSON Guide · YAML 1.2 Specification

Table Formatting and Rendering

Tabular data can be rendered in multiple output formats depending on the target platform — Markdown tables for documentation, HTML tables for web pages, or structured TableData objects for programmatic access. peasy-document provides three rendering paths from CSV input, all using Python's stdlib csv module.

Function Output Use Case
csv_to_table() TableData dataclass Programmatic access to headers, rows, dimensions
csv_to_markdown() Pipe-aligned Markdown table GitHub README, documentation, Jupyter notebooks
csv_to_html() <table> with <thead>/<tbody> Web pages, email templates, reports
from peasy_document import csv_to_table, csv_to_markdown, csv_to_html

# Parse CSV into structured TableData — access headers, rows, and dimensions
table = csv_to_table("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(table.headers)       # ['Name', 'Age', 'City']
print(table.row_count)     # 2
print(table.column_count)  # 3
print(table.rows[0])       # ['Alice', '30', 'NYC']

# Render as Markdown table with aligned columns
result = csv_to_markdown("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(result.content)
# | Name  | Age | City |
# | ----- | --- | ---- |
# | Alice | 30  | NYC  |
# | Bob   | 25  | LA   |

# Render as HTML table with proper thead/tbody structure
result = csv_to_html("Name,Age\nAlice,30")
print(result.content)
# <table>
#   <thead>
#     <tr>
#       <th>Name</th>
#       <th>Age</th>
#     </tr>
#   </thead>
#   <tbody>
#     <tr>
#       <td>Alice</td>
#       <td>30</td>
#     </tr>
#   </tbody>
# </table>

Learn more: CSV to Markdown Tool · CSV to HTML Tool · Table Formatting Guide

Command-Line Interface

Install with CLI support: pip install "peasy-document[cli]"

The CLI exposes 6 conversion commands. All commands write to stdout by default — use -o / --output to write to a file.

# Convert Markdown to HTML
peasy-document md-to-html README.md -o output.html

# Strip HTML to plain text
peasy-document html-to-text page.html

# Convert CSV to JSON
peasy-document csv-to-json data.csv -o data.json

# Convert JSON array to CSV
peasy-document json-to-csv records.json -o records.csv

# CSV to Markdown table
peasy-document csv-to-markdown data.csv

# HTML to Markdown
peasy-document html-to-markdown page.html -o page.md
Command Description Options
md-to-html Convert Markdown file to HTML -o OUTPUT
html-to-text Strip HTML tags, extract plain text -o OUTPUT
csv-to-json Convert CSV to JSON array of objects -o OUTPUT, -d DELIMITER
json-to-csv Convert JSON array to CSV -o OUTPUT
csv-to-markdown Render CSV as Markdown table -o OUTPUT, -d DELIMITER
html-to-markdown Convert HTML to Markdown -o OUTPUT

API Reference

Conversion Functions

Function Input Output Dependencies
markdown_to_html(source, *, extensions=None) Markdown HTML markdown library
html_to_text(source) HTML Plain text stdlib only
html_to_markdown(source) HTML Markdown stdlib only
text_to_html(source) Plain text HTML stdlib only
csv_to_json(source, *, delimiter=",") CSV JSON stdlib only
json_to_csv(source) JSON CSV stdlib only
csv_to_table(source, *, delimiter=",") CSV TableData stdlib only
csv_to_markdown(source, *, delimiter=",") CSV Markdown table stdlib only
csv_to_html(source, *, delimiter=",") CSV HTML table stdlib only
json_to_yaml(source) JSON YAML stdlib only

All functions accept TextInput (str | bytes | Path) and return ConversionResult or TableData.

Types

Type Description Fields
TextInput Union type alias str | bytes | Path
ConversionResult Frozen dataclass — conversion output with metadata content: str, source_format: str, target_format: str, source_size: int, target_size: int
TableData Frozen dataclass — structured table representation headers: list[str], rows: list[list[str]], row_count: int, column_count: int

Learn More About Document Conversion

Peasy Developer Tools

Part of the Peasy Tools developer tools ecosystem.

Package PyPI Description
peasy-document PyPI Document conversion — Markdown, HTML, CSV, JSON, YAMLpeasydocument.com
peasy-pdf PyPI PDF manipulation and conversion — peasypdf.com
peasy-image PyPI Image format conversion and optimization — peasyimage.com
peasytext PyPI Text analysis and transformation — peasytext.com
peasy-css PyPI CSS minification and processing — peasycss.com
peasy-compress PyPI File compression utilities — peasytools.com
peasy-video PyPI Video processing and conversion — peasyvideo.com
peasy-audio PyPI Audio processing and conversion — peasyaudio.com
peasy-convert PyPI Unified CLI for all Peasy tools — peasytools.com
peasy-mcp PyPI Unified MCP hub for AI assistants — peasytools.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peasy_document-0.1.1.tar.gz (536.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peasy_document-0.1.1-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file peasy_document-0.1.1.tar.gz.

File metadata

  • Download URL: peasy_document-0.1.1.tar.gz
  • Upload date:
  • Size: 536.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_document-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b63827752655f1868d0d97920240161b2482da813088e14d8d4190254fe4ecd3
MD5 a262a23358871fa9a9d268343b4a3f45
BLAKE2b-256 b14616725c9330c7a20efef8f2cac60b8319655659850879b9bf1ca9c1dae801

See more details on using hashes here.

File details

Details for the file peasy_document-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: peasy_document-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_document-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72b558a84bc2ae94f783ac3b31baf82131718d069fa871a0dbd004ff75ebf619
MD5 d88451b5908dfbb9c3c2d3590b252470
BLAKE2b-256 77f422a9a1dcb4d07778f6a8ddd676a3f6e9d62a5cc718af4de7f17a51ed98ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page