Document conversion library — Markdown, HTML, CSV, and JSON transformations with zero heavyweight dependencies.
Project description
peasy-document
Pure Python document conversion library with 10 conversion functions across 6 formats — Markdown, HTML, CSV, JSON, YAML, and plain text. Convert between document formats with frozen dataclass results, full conversion metadata, and only one lightweight dependency (markdown). Handles strings, bytes, and file paths uniformly through a single TextInput type, so you never have to worry about I/O boilerplate.
Built from the document conversion engine behind PeasyDocument, which provides interactive browser-based tools for Markdown to HTML conversion, CSV to JSON transformation, and HTML to Markdown extraction. The library covers 10 conversion paths with sub-millisecond performance for typical documents.
Try the interactive tools at peasytools.com — document conversion for Markdown, HTML, CSV, JSON, and YAML formats.
Table of Contents
- Install
- Quick Start
- What You Can Do
- Command-Line Interface
- API Reference
- Learn More About Document Conversion
- Also Available
- Peasy Developer Tools
- License
Install
# Core library (only markdown dependency)
pip install peasy-document
# With CLI support
pip install "peasy-document[cli]"
# Everything
pip install "peasy-document[all]"
Quick Start
from peasy_document import markdown_to_html, csv_to_json, html_to_text
# Convert Markdown to HTML with tables, code highlighting, and TOC support
result = markdown_to_html("# Hello World\n\nThis is **bold** text.")
print(result.content)
# <h1>Hello World</h1>
# <p>This is <strong>bold</strong> text.</p>
# Convert CSV data to JSON array of objects
result = csv_to_json("name,age\nAlice,30\nBob,25")
print(result.content)
# [{"name": "Alice", "age": "30"}, {"name": "Bob", "age": "25"}]
# Strip HTML to plain text — removes all tags and decodes entities
result = html_to_text("<h1>Title</h1><p>Hello & welcome.</p>")
print(result.content)
# Title
# Hello & welcome.
All functions return frozen dataclasses with conversion metadata — source format, target format, and byte sizes before and after conversion:
# Every ConversionResult carries metadata about the transformation
result = markdown_to_html("# Hello")
print(result.source_format) # "markdown"
print(result.target_format) # "html"
print(result.source_size) # 7 (bytes of input)
print(result.target_size) # 18 (bytes of output)
What You Can Do
Markdown to HTML Conversion
Markdown is the de facto standard for developer documentation, README files, and technical writing. Defined by the CommonMark specification, Markdown provides a lightweight syntax that maps cleanly to HTML. peasy-document uses the battle-tested Python-Markdown library under the hood, with sensible defaults that cover the most common use cases out of the box.
| Feature | Extension | Enabled by Default |
|---|---|---|
| Pipe tables | tables |
Yes |
| Fenced code blocks | fenced_code |
Yes |
| Syntax highlighting | codehilite |
Yes |
| Table of contents | toc |
Yes |
| Custom extensions | Pass any Python-Markdown extension | Via extensions= kwarg |
from peasy_document import markdown_to_html
# Convert Markdown with default extensions: tables, fenced_code, codehilite, toc
result = markdown_to_html("""
# API Documentation
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /users | List users |
| POST | /users | Create user |
```python
import requests
response = requests.get("/users")
\```
""")
# Override extensions for minimal output (tables + toc only)
result = markdown_to_html("content", extensions=["tables", "toc"])
Accepts str, bytes, or Path objects — read from files or process raw data without boilerplate:
from pathlib import Path
# Read Markdown from a file and convert to HTML
result = markdown_to_html(Path("README.md"))
# Process binary content from an HTTP response or database blob
result = markdown_to_html(b"# Binary input works too")
Learn more: PeasyDocument · CommonMark Specification
HTML Processing and Extraction
HTML is the backbone of the web, but extracting useful content from HTML documents often requires stripping tags, decoding entities (& to &, < to <), and ignoring non-content elements like <script> and <style> blocks. peasy-document provides two extraction paths: HTML to plain text for content indexing, and HTML to Markdown for content migration or CMS workflows.
Both functions use Python's stdlib html.parser — no external dependencies like BeautifulSoup or lxml required.
| Conversion | Use Case | Tags Handled |
|---|---|---|
| HTML to Text | Search indexing, content extraction, text analysis | Strips all tags, decodes entities, ignores <script>/<style> |
| HTML to Markdown | CMS migration, content republishing, documentation conversion | p, h1-h6, a, strong/b, em/i, ul/ol/li, code, pre, br, img |
| Text to HTML | Plain text formatting, email body generation | Wraps paragraphs in <p>, converts single newlines to <br> |
from peasy_document import html_to_text, html_to_markdown, text_to_html
# Strip HTML to plain text — useful for search indexing and content analysis
result = html_to_text("""
<html>
<head><title>Page</title></head>
<body>
<h1>Welcome</h1>
<p>This is a <strong>formatted</strong> document with & entities.</p>
<script>alert('ignored')</script>
</body>
</html>
""")
print(result.content)
# Welcome
# This is a formatted document with & entities.
# Convert HTML to Markdown — preserves links, emphasis, headings, and lists
result = html_to_markdown("""
<h1>Document Title</h1>
<p>Visit <a href="https://example.com">our site</a> for <strong>more info</strong>.</p>
<ul>
<li>First item</li>
<li>Second item</li>
</ul>
""")
print(result.content)
# # Document Title
# Visit [our site](https://example.com) for **more info**.
# - First item
# - Second item
# Convert plain text to HTML paragraphs — double newlines become <p> tags
result = text_to_html("First paragraph.\n\nSecond paragraph.\nWith a line break.")
print(result.content)
# <p>First paragraph.</p>
# <p>Second paragraph.<br>With a line break.</p>
Learn more: PeasyDocument · Developer Docs
CSV and JSON Transformation
CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are the two most common data interchange formats. CSV, defined in RFC 4180, represents tabular data with rows and columns. JSON, specified in RFC 8259, represents structured data as nested objects and arrays. Converting between these two formats is one of the most frequent tasks in data processing pipelines, API integrations, and ETL workflows.
peasy-document handles both directions using Python's stdlib csv and json modules — no pandas or external data libraries required.
| Direction | Input Format | Output Format | Key Features |
|---|---|---|---|
| CSV to JSON | RFC 4180 CSV with header row | JSON array of objects | Custom delimiters, header-keyed objects |
| JSON to CSV | JSON array of objects | CSV with auto-detected headers | Handles inconsistent keys across objects |
from peasy_document import csv_to_json, json_to_csv
# CSV to JSON — each row becomes a JSON object keyed by header values
result = csv_to_json("name,role,team\nAlice,Engineer,Backend\nBob,Designer,Frontend")
print(result.content)
# [
# {"name": "Alice", "role": "Engineer", "team": "Backend"},
# {"name": "Bob", "role": "Designer", "team": "Frontend"}
# ]
# Roundtrip: JSON back to CSV preserves column order
result = json_to_csv(result.content)
print(result.content)
# name,role,team
# Alice,Engineer,Backend
# Bob,Designer,Frontend
# Tab-separated values (TSV) — pass any single-character delimiter
result = csv_to_json("name\tage\nAlice\t30", delimiter="\t")
# Handles inconsistent keys gracefully — union of all keys becomes the header
result = json_to_csv('[{"a": 1, "b": 2}, {"b": 3, "c": 4}]')
# → a,b,c header with empty cells where keys are missing
Learn more: PeasyDocument · RFC 4180 CSV Standard
JSON to YAML Conversion
YAML (YAML Ain't Markup Language) is widely used for configuration files — Kubernetes manifests, Docker Compose files, CI/CD pipelines, and infrastructure-as-code tools all rely on YAML's human-readable format. Converting JSON to YAML is a common need when moving between API responses (JSON) and configuration files (YAML).
peasy-document implements JSON-to-YAML conversion with a recursive pure-Python renderer. No PyYAML dependency is required. The converter handles nested objects, arrays, strings, numbers, booleans, and null values. Special characters in strings are automatically quoted per the YAML 1.2 specification.
| YAML Feature | Supported | Notes |
|---|---|---|
| Nested objects | Yes | Indented with 2 spaces |
| Arrays | Yes | Block sequence style (- item) |
| Strings with special chars | Yes | Auto-quoted (":", "#", brackets, etc.) |
| Reserved words | Yes | true, false, null, yes, no are quoted when used as strings |
| Numbers and booleans | Yes | Rendered without quotes |
from peasy_document import json_to_yaml
# Convert a JSON config object to YAML — handles nested structures
result = json_to_yaml('{"server": {"host": "localhost", "port": 8080}, "debug": true}')
print(result.content)
# server:
# host: localhost
# port: 8080
# debug: true
# Arrays render as YAML block sequences
result = json_to_yaml('{"tags": ["python", "yaml", "json"], "count": 3}')
print(result.content)
# tags:
# - python
# - yaml
# - json
# count: 3
# Special characters in values are auto-quoted for YAML safety
result = json_to_yaml('{"url": "https://example.com:8080/path#section"}')
print(result.content)
# url: "https://example.com:8080/path#section"
Learn more: PeasyDocument · YAML 1.2 Specification
Table Formatting and Rendering
Tabular data can be rendered in multiple output formats depending on the target platform — Markdown tables for documentation, HTML tables for web pages, or structured TableData objects for programmatic access. peasy-document provides three rendering paths from CSV input, all using Python's stdlib csv module.
| Function | Output | Use Case |
|---|---|---|
csv_to_table() |
TableData dataclass |
Programmatic access to headers, rows, dimensions |
csv_to_markdown() |
Pipe-aligned Markdown table | GitHub README, documentation, Jupyter notebooks |
csv_to_html() |
<table> with <thead>/<tbody> |
Web pages, email templates, reports |
from peasy_document import csv_to_table, csv_to_markdown, csv_to_html
# Parse CSV into structured TableData — access headers, rows, and dimensions
table = csv_to_table("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(table.headers) # ['Name', 'Age', 'City']
print(table.row_count) # 2
print(table.column_count) # 3
print(table.rows[0]) # ['Alice', '30', 'NYC']
# Render as Markdown table with aligned columns
result = csv_to_markdown("Name,Age,City\nAlice,30,NYC\nBob,25,LA")
print(result.content)
# | Name | Age | City |
# | ----- | --- | ---- |
# | Alice | 30 | NYC |
# | Bob | 25 | LA |
# Render as HTML table with proper thead/tbody structure
result = csv_to_html("Name,Age\nAlice,30")
print(result.content)
# <table>
# <thead>
# <tr>
# <th>Name</th>
# <th>Age</th>
# </tr>
# </thead>
# <tbody>
# <tr>
# <td>Alice</td>
# <td>30</td>
# </tr>
# </tbody>
# </table>
Learn more: PeasyDocument · OpenAPI Spec
Command-Line Interface
Install with CLI support: pip install "peasy-document[cli]"
The CLI exposes 6 conversion commands. All commands write to stdout by default — use -o / --output to write to a file.
# Convert Markdown to HTML
peasy-document md-to-html README.md -o output.html
# Strip HTML to plain text
peasy-document html-to-text page.html
# Convert CSV to JSON
peasy-document csv-to-json data.csv -o data.json
# Convert JSON array to CSV
peasy-document json-to-csv records.json -o records.csv
# CSV to Markdown table
peasy-document csv-to-markdown data.csv
# HTML to Markdown
peasy-document html-to-markdown page.html -o page.md
| Command | Description | Options |
|---|---|---|
md-to-html |
Convert Markdown file to HTML | -o OUTPUT |
html-to-text |
Strip HTML tags, extract plain text | -o OUTPUT |
csv-to-json |
Convert CSV to JSON array of objects | -o OUTPUT, -d DELIMITER |
json-to-csv |
Convert JSON array to CSV | -o OUTPUT |
csv-to-markdown |
Render CSV as Markdown table | -o OUTPUT, -d DELIMITER |
html-to-markdown |
Convert HTML to Markdown | -o OUTPUT |
API Reference
Conversion Functions
| Function | Input | Output | Dependencies |
|---|---|---|---|
markdown_to_html(source, *, extensions=None) |
Markdown | HTML | markdown library |
html_to_text(source) |
HTML | Plain text | stdlib only |
html_to_markdown(source) |
HTML | Markdown | stdlib only |
text_to_html(source) |
Plain text | HTML | stdlib only |
csv_to_json(source, *, delimiter=",") |
CSV | JSON | stdlib only |
json_to_csv(source) |
JSON | CSV | stdlib only |
csv_to_table(source, *, delimiter=",") |
CSV | TableData |
stdlib only |
csv_to_markdown(source, *, delimiter=",") |
CSV | Markdown table | stdlib only |
csv_to_html(source, *, delimiter=",") |
CSV | HTML table | stdlib only |
json_to_yaml(source) |
JSON | YAML | stdlib only |
All functions accept TextInput (str | bytes | Path) and return ConversionResult or TableData.
Types
| Type | Description | Fields |
|---|---|---|
TextInput |
Union type alias | str | bytes | Path |
ConversionResult |
Frozen dataclass — conversion output with metadata | content: str, source_format: str, target_format: str, source_size: int, target_size: int |
TableData |
Frozen dataclass — structured table representation | headers: list[str], rows: list[list[str]], row_count: int, column_count: int |
Learn More About Document Conversion
- Tools: Markdown to HTML · CSV to JSON · HTML to Markdown · All Tools
- Guides: Markdown Guide · CSV Processing Guide · All Guides
- Glossary: Markdown · YAML · All Terms
- Formats: Markdown · CSV · JSON · All Formats
- API: REST API Docs · OpenAPI Spec
Also Available
| Platform | Install | Link |
|---|---|---|
| TypeScript / npm | npm install peasy-document |
npm |
| Go | go get github.com/peasytools/peasy-document-go |
pkg.go.dev |
| Rust | cargo add peasy-document |
crates.io |
| Ruby | gem install peasy-document |
RubyGems |
| MCP | uvx --from "peasy-document[mcp]" python -m peasy_document.mcp_server |
Config |
Peasy Developer Tools
Part of the Peasy open-source developer tools ecosystem.
| Package | PyPI | npm | Description |
|---|---|---|---|
| peasy-pdf | PyPI | npm | PDF merge, split, compress, 21 operations — peasypdf.com |
| peasy-image | PyPI | npm | Image resize, crop, convert, compress, 20 operations — peasyimage.com |
| peasytext | PyPI | npm | Text case, slugify, word count, encoding — peasytext.com |
| peasy-css | PyPI | npm | CSS gradients, shadows, flexbox, grid generators — peasycss.com |
| peasy-compress | PyPI | npm | ZIP, TAR, gzip, brotli archive operations — peasytools.com |
| peasy-document | PyPI | npm | Markdown, HTML, CSV, JSON conversions — peasyformats.com |
| peasy-audio | PyPI | npm | Audio convert, trim, merge, normalize — peasyaudio.com |
| peasy-video | PyPI | npm | Video trim, resize, GIF conversion — peasyvideo.com |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peasy_document-0.2.0.tar.gz.
File metadata
- Download URL: peasy_document-0.2.0.tar.gz
- Upload date:
- Size: 538.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7426fa8e44f9f869e0bf8c19eade50cb164fa5e7c1616f9f0e5c427a532f4d8a
|
|
| MD5 |
7b5c6ffc3661893764ae060ca382bf46
|
|
| BLAKE2b-256 |
fc5c633112d87a1a8a831a97635e853e97041aa4b7dea6b03f395fe2074b3a7d
|
File details
Details for the file peasy_document-0.2.0-py3-none-any.whl.
File metadata
- Download URL: peasy_document-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d345f7cff45eac172b3c4eda5de1690e27ad62a8d51b2c99108a582e12422e3a
|
|
| MD5 |
8b8f24bf9905c7c6fdfaf588f0a8bd28
|
|
| BLAKE2b-256 |
d83861f47ac26cd1d633e311a39855a5238b6cb1eb5b205554d972daf2592904
|