Skip to main content

Document conversion library — Markdown, HTML, CSV, and JSON transformations with zero heavyweight dependencies.

Project description

peasy-document

PyPI version Python License: MIT

Pure Python document conversion library with 10 conversion functions across 6 formats — Markdown, HTML, CSV, JSON, YAML, and plain text. Convert between document formats with frozen dataclass results, full conversion metadata, and only one lightweight dependency (markdown). Handles strings, bytes, and file paths uniformly through a single TextInput type, so you never have to worry about I/O boilerplate.

Built from the document conversion engine behind PeasyDocument, which provides interactive browser-based tools for Markdown to HTML conversion, CSV to JSON transformation, and HTML to Markdown extraction. The library covers 10 conversion paths with sub-millisecond performance for typical documents.

Try the interactive tools at peasytools.com — document conversion for Markdown, HTML, CSV, JSON, and YAML formats.

peasy-document demo — Markdown to HTML, CSV to JSON conversion in Python REPL

Table of Contents

Install

# Core library (only markdown dependency)
pip install peasy-document

# With CLI support
pip install "peasy-document[cli]"

# Everything
pip install "peasy-document[all]"

Quick Start

from peasy_document import markdown_to_html, csv_to_json, html_to_text

# Convert Markdown to HTML with tables, code highlighting, and TOC support
result = markdown_to_html("# Hello World\n\nThis is **bold** text.")
print(result.content)
# <h1>Hello World</h1>
# <p>This is <strong>bold</strong> text.</p>

# Convert CSV data to JSON array of objects
result = csv_to_json("name,age\nAlice,30\nBob,25")
print(result.content)
# [{"name": "Alice", "age": "30"}, {"name": "Bob", "age": "25"}]

# Strip HTML to plain text — removes all tags and decodes entities
result = html_to_text("<h1>Title</h1><p>Hello &amp; welcome.</p>")
print(result.content)
# Title
# Hello & welcome.

All functions return frozen dataclasses with conversion metadata — source format, target format, and byte sizes before and after conversion:

# Every ConversionResult carries metadata about the transformation
result = markdown_to_html("# Hello")
print(result.source_format)  # "markdown"
print(result.target_format)  # "html"
print(result.source_size)    # 7 (bytes of input)
print(result.target_size)    # 18 (bytes of output)

What You Can Do

Markdown to HTML Conversion

Markdown is the de facto standard for developer documentation, README files, and technical writing. Defined by the CommonMark specification, Markdown provides a lightweight syntax that maps cleanly to HTML. peasy-document uses the battle-tested Python-Markdown library under the hood, with sensible defaults that cover the most common use cases out of the box.

Feature Extension Enabled by Default
Pipe tables tables Yes
Fenced code blocks fenced_code Yes
Syntax highlighting codehilite Yes
Table of contents toc Yes
Custom extensions Pass any Python-Markdown extension Via extensions= kwarg
from peasy_document import markdown_to_html

# Convert Markdown with default extensions: tables, fenced_code, codehilite, toc
result = markdown_to_html("""
# API Documentation

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET    | /users   | List users  |
| POST   | /users   | Create user |

```python
import requests
response = requests.get("/users")
\```
""")

# Override extensions for minimal output (tables + toc only)
result = markdown_to_html("content", extensions=["tables", "toc"])

Accepts str, bytes, or Path objects — read from files or process raw data without boilerplate:

from pathlib import Path

# Read Markdown from a file and convert to HTML
result = markdown_to_html(Path("README.md"))

# Process binary content from an HTTP response or database blob
result = markdown_to_html(b"# Binary input works too")

Learn more: PeasyDocument · CommonMark Specification

HTML Processing and Extraction

HTML is the backbone of the web, but extracting useful content from HTML documents often requires stripping tags, decoding entities (&amp; to &, &lt; to <), and ignoring non-content elements like <script> and <style> blocks. peasy-document provides two extraction paths: HTML to plain text for content indexing, and HTML to Markdown for content migration or CMS workflows.

Both functions use Python's stdlib html.parser — no external dependencies like BeautifulSoup or lxml required.

Conversion Use Case Tags Handled
HTML to Text Search indexing, content extraction, text analysis Strips all tags, decodes entities, ignores <script>/<style>
HTML to Markdown CMS migration, content republishing, documentation conversion p, h1-h6, a, strong/b, em/i, ul/ol/li, code, pre, br, img
Text to HTML Plain text formatting, email body generation Wraps paragraphs in <p>, converts single newlines to <br>
from peasy_document import html_to_text, html_to_markdown, text_to_html

# Strip HTML to plain text — useful for search indexing and content analysis
result = html_to_text("""
<html>
<head><title>Page</title></head>
<body>
  <h1>Welcome</h1>
  <p>This is a <strong>formatted</strong> document with &amp; entities.</p>
  <script>alert('ignored')</script>
</body>
</html>
""")
print(result.content)
# Welcome
# This is a formatted document with & entities.

# Convert HTML to Markdown — preserves links, emphasis, headings, and lists
result = html_to_markdown("""
<h1>Document Title</h1>
<p>Visit <a href="https://example.com">our site</a> for <strong>more info</strong>.</p>
<ul>
  <li>First item</li>
  <li>Second item</li>
</ul>
""")
print(result.content)
# # Document Title
# Visit [our site](https://example.com) for **more info**.
# - First item
# - Second item

# Convert plain text to HTML paragraphs — double newlines become <p> tags
result = text_to_html("First paragraph.\n\nSecond paragraph.\nWith a line break.")
print(result.content)
# <p>First paragraph.</p>
# <p>Second paragraph.<br>With a line break.</p>

Learn more: PeasyDocument · Developer Docs

CSV and JSON Transformation

CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are the two most common data interchange formats. CSV, defined in RFC 4180, represents tabular data with rows and columns. JSON, specified in RFC 8259, represents structured data as nested objects and arrays. Converting between these two formats is one of the most frequent tasks in data processing pipelines, API integrations, and ETL workflows.

peasy-document handles both directions using Python's stdlib csv and json modules — no pandas or external data libraries required.

Direction Input Format Output Format Key Features
CSV to JSON RFC 4180 CSV with header row JSON array of objects Custom delimiters, header-keyed objects
JSON to CSV JSON array of objects CSV with auto-detected headers Handles inconsistent keys across objects
from peasy_document import csv_to_json, json_to_csv

# CSV to JSON — each row becomes a JSON object keyed by header values
result = csv_to_json("name,role,team\nAlice,Engineer,Backend\nBob,Designer,Frontend")
print(result.content)
# [
#   {"name": "Alice", "role": "Engineer", "team": "Backend"},
#   {"name": "Bob", "role": "Designer", "team": "Frontend"}
# ]

# Roundtrip: JSON back to CSV preserves column order
result = json_to_csv(result.content)
print(result.content)
# name,role,team
# Alice,Engineer,Backend
# Bob,Designer,Frontend

# Tab-separated values (TSV) — pass any single-character delimiter
result = csv_to_json("name\tage\nAlice\t30", delimiter="\t")

# Handles inconsistent keys gracefully — union of all keys becomes the header
result = json_to_csv('[{"a": 1, "b": 2}, {"b": 3, "c": 4}]')
# → a,b,c header with empty cells where keys are missing

Learn more: PeasyDocument · RFC 4180 CSV Standard

JSON to YAML Conversion

YAML (YAML Ain't Markup Language) is widely used for configuration files — Kubernetes manifests, Docker Compose files, CI/CD pipelines, and infrastructure-as-code tools all rely on YAML's human-readable format. Converting JSON to YAML is a common need when moving between API responses (JSON) and configuration files (YAML).

peasy-document implements JSON-to-YAML conversion with a recursive pure-Python renderer. No PyYAML dependency is required. The converter handles nested objects, arrays, strings, numbers, booleans, and null values. Special characters in strings are automatically quoted per the YAML 1.2 specification.

YAML Feature Supported Notes
Nested objects Yes Indented with 2 spaces
Arrays Yes Block sequence style (- item)
Strings with special chars Yes Auto-quoted (":", "#", brackets, etc.)
Reserved words Yes true, false, null, yes, no are quoted when used as strings
Numbers and booleans Yes Rendered without quotes
from peasy_document import json_to_yaml

# Convert a JSON config object to YAML — handles nested structures
result = json_to_yaml('{"server": {"host": "localhost", "port": 8080}, "debug": true}')
print(result.content)
# server:
#   host: localhost
#   port: 8080
# debug: true

# Arrays render as YAML block sequences
result = json_to_yaml('{"tags": ["python", "yaml", "json"], "count": 3}')
print(result.content)
# tags:
#   - python
#   - yaml
#   - json
# count: 3

# Special characters in values are auto-quoted for YAML safety
result = json_to_yaml('{"url": "https://example.com:8080/path#section"}')
print(result.content)
# url: "https://example.com:8080/path#section"

Learn more: PeasyDocument · YAML 1.2 Specification

Table Formatting and Rendering

Tabular data can be rendered in multiple output formats depending on the target platform — Markdown tables for documentation, HTML tables for web pages, or structured TableData objects for programmatic access. peasy-document provides three rendering paths from CSV input, all using Python's stdlib csv module.

Function Output Use Case
csv_to_table() TableData dataclass Programmatic access to headers, rows, dimensions
csv_to_markdown() Pipe-aligned Markdown table GitHub README, documentation, Jupyter notebooks
csv_to_html() <table> with <thead>/<tbody> Web pages, email templates, reports
from peasy_document import csv_to_table, csv_to_markdown, csv_to_html

# Parse CSV into structured TableData — access headers, rows, and dimensions
table = csv_to_table("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(table.headers)       # ['Name', 'Age', 'City']
print(table.row_count)     # 2
print(table.column_count)  # 3
print(table.rows[0])       # ['Alice', '30', 'NYC']

# Render as Markdown table with aligned columns
result = csv_to_markdown("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(result.content)
# | Name  | Age | City |
# | ----- | --- | ---- |
# | Alice | 30  | NYC  |
# | Bob   | 25  | LA   |

# Render as HTML table with proper thead/tbody structure
result = csv_to_html("Name,Age\nAlice,30")
print(result.content)
# <table>
#   <thead>
#     <tr>
#       <th>Name</th>
#       <th>Age</th>
#     </tr>
#   </thead>
#   <tbody>
#     <tr>
#       <td>Alice</td>
#       <td>30</td>
#     </tr>
#   </tbody>
# </table>

Learn more: PeasyDocument · OpenAPI Spec

Command-Line Interface

Install with CLI support: pip install "peasy-document[cli]"

The CLI exposes 6 conversion commands. All commands write to stdout by default — use -o / --output to write to a file.

# Convert Markdown to HTML
peasy-document md-to-html README.md -o output.html

# Strip HTML to plain text
peasy-document html-to-text page.html

# Convert CSV to JSON
peasy-document csv-to-json data.csv -o data.json

# Convert JSON array to CSV
peasy-document json-to-csv records.json -o records.csv

# CSV to Markdown table
peasy-document csv-to-markdown data.csv

# HTML to Markdown
peasy-document html-to-markdown page.html -o page.md
Command Description Options
md-to-html Convert Markdown file to HTML -o OUTPUT
html-to-text Strip HTML tags, extract plain text -o OUTPUT
csv-to-json Convert CSV to JSON array of objects -o OUTPUT, -d DELIMITER
json-to-csv Convert JSON array to CSV -o OUTPUT
csv-to-markdown Render CSV as Markdown table -o OUTPUT, -d DELIMITER
html-to-markdown Convert HTML to Markdown -o OUTPUT

API Reference

Conversion Functions

Function Input Output Dependencies
markdown_to_html(source, *, extensions=None) Markdown HTML markdown library
html_to_text(source) HTML Plain text stdlib only
html_to_markdown(source) HTML Markdown stdlib only
text_to_html(source) Plain text HTML stdlib only
csv_to_json(source, *, delimiter=",") CSV JSON stdlib only
json_to_csv(source) JSON CSV stdlib only
csv_to_table(source, *, delimiter=",") CSV TableData stdlib only
csv_to_markdown(source, *, delimiter=",") CSV Markdown table stdlib only
csv_to_html(source, *, delimiter=",") CSV HTML table stdlib only
json_to_yaml(source) JSON YAML stdlib only

All functions accept TextInput (str | bytes | Path) and return ConversionResult or TableData.

Types

Type Description Fields
TextInput Union type alias str | bytes | Path
ConversionResult Frozen dataclass — conversion output with metadata content: str, source_format: str, target_format: str, source_size: int, target_size: int
TableData Frozen dataclass — structured table representation headers: list[str], rows: list[list[str]], row_count: int, column_count: int

Learn More About Document Conversion

Also Available

Platform Install Link
TypeScript / npm npm install peasy-document npm
Go go get github.com/peasytools/peasy-document-go pkg.go.dev
Rust cargo add peasy-document crates.io
Ruby gem install peasy-document RubyGems
MCP uvx --from "peasy-document[mcp]" python -m peasy_document.mcp_server Config

Peasy Developer Tools

Part of the Peasy open-source developer tools ecosystem.

Package PyPI npm Description
peasy-pdf PyPI npm PDF merge, split, compress, 21 operations — peasypdf.com
peasy-image PyPI npm Image resize, crop, convert, compress, 20 operations — peasyimage.com
peasytext PyPI npm Text case, slugify, word count, encoding — peasytext.com
peasy-css PyPI npm CSS gradients, shadows, flexbox, grid generators — peasycss.com
peasy-compress PyPI npm ZIP, TAR, gzip, brotli archive operations — peasytools.com
peasy-document PyPI npm Markdown, HTML, CSV, JSON conversions — peasyformats.com
peasy-audio PyPI npm Audio convert, trim, merge, normalize — peasyaudio.com
peasy-video PyPI npm Video trim, resize, GIF conversion — peasyvideo.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peasy_document-0.2.0.tar.gz (538.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peasy_document-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file peasy_document-0.2.0.tar.gz.

File metadata

  • Download URL: peasy_document-0.2.0.tar.gz
  • Upload date:
  • Size: 538.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_document-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7426fa8e44f9f869e0bf8c19eade50cb164fa5e7c1616f9f0e5c427a532f4d8a
MD5 7b5c6ffc3661893764ae060ca382bf46
BLAKE2b-256 fc5c633112d87a1a8a831a97635e853e97041aa4b7dea6b03f395fe2074b3a7d

See more details on using hashes here.

File details

Details for the file peasy_document-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: peasy_document-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peasy_document-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d345f7cff45eac172b3c4eda5de1690e27ad62a8d51b2c99108a582e12422e3a
MD5 8b8f24bf9905c7c6fdfaf588f0a8bd28
BLAKE2b-256 d83861f47ac26cd1d633e311a39855a5238b6cb1eb5b205554d972daf2592904

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page