Convert JSON to BeautifulSoup HTML/XML structures
Project description
jsoup
A Python library that converts JSON structures into BeautifulSoup HTML/XML trees. The inverse of bs2json — build HTML from dictionaries with full support for attributes, comments, doctypes, and nested elements.
Python 3.8+ | Only dependency: beautifulsoup4
Table of Contents
| Section | Description |
|---|---|
| Installation | How to install |
| Quick Start | Basic usage |
| Input Format | How JSON maps to HTML |
| Features | Attributes, lists, comments, empty elements, doctypes |
| bs2json Roundtrip | Using bs2json output as jsoup input |
| Options | Custom labels, duplicate attributes, char refs |
| API Reference | JsonTreeBuilder, install() |
| Contributing | How to contribute |
Installation
pip install -U jsoup
Quick Start
from jsoup import JsonTreeBuilder
from bs4 import BeautifulSoup
json = {
"body": {
"h1": {"attrs": {"class": "title"}, "text": "Hello World"},
"p": "This is a paragraph.",
"br": None,
"ul": {
"li": ["Item 1", "Item 2", "Item 3"]
}
}
}
soup = BeautifulSoup(json, builder=JsonTreeBuilder)
print(soup.prettify())
Output:
<body>
<h1 class="title">
Hello World
</h1>
<p>
This is a paragraph.
</p>
<br/>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
</body>
Input Format
| JSON | HTML |
|---|---|
{"p": "text"} |
<p>text</p> |
{"br": None} |
<br/> |
{"p": {"attrs": {"class": "x"}, "text": "hello"}} |
<p class="x">hello</p> |
{"li": ["a", "b", "c"]} |
<li>a</li><li>b</li><li>c</li> |
{"comment": "note"} |
<!--note--> |
{"doctype": "html"} |
<!DOCTYPE html> |
{"div": {"children": [{"p": "a"}, {"p": "b"}]}} |
<div><p>a</p><p>b</p></div> |
Features
Attributes
Attributes are passed via the attrs key:
json = {
"a": {"attrs": {"href": "/home", "class": "nav"}, "text": "Home"},
"img": {"attrs": {"src": "photo.jpg", "alt": "Photo"}}
}
Produces:
<a class="nav" href="/home">Home</a>
<img alt="Photo" src="photo.jpg"/>
Lists (Multiple Same Tags)
A list value creates multiple tags with the same name:
json = {"ul": {"li": ["Apple", "Banana", "Cherry"]}}
Produces:
<ul><li>Apple</li><li>Banana</li><li>Cherry</li></ul>
List items can also be dicts with nested content:
json = {"ul": {"li": [
"Simple item",
{"text": "Item with link", "a": {"attrs": {"href": "/"}, "text": "click"}}
]}}
Comments
json = {
"body": {
"comment": "This is a comment",
"p": "Visible text"
}
}
# Produces: <!--This is a comment--><p>Visible text</p>
Empty Elements
Use None for self-closing tags:
json = {"body": {"br": None, "hr": None}}
# Produces: <body><br/><hr/></body>
Doctypes
json = {
"doctype": "html",
"html": {"body": {"p": "content"}}
}
Nested Structures
Nesting works naturally:
json = {
"html": {
"head": {"title": "My Page"},
"body": {
"header": {
"nav": {"ul": {"li": [
{"a": {"attrs": {"href": "/"}, "text": "Home"}},
{"a": {"attrs": {"href": "/about"}, "text": "About"}}
]}}
},
"main": {"h1": "Welcome", "p": "Content here"},
"footer": {"p": "Copyright 2026"}
}
}
}
bs2json Roundtrip
jsoup understands the children key from bs2json's ordered output, enabling roundtrip conversion:
from bs2json import BS2Json
from bs4 import BeautifulSoup
from jsoup import JsonTreeBuilder
# HTML -> JSON (bs2json)
html = "<html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>"
json_data = BS2Json(html).convert()
# {'html': {'body': {'children': [{'h1': 'Title'}, {'p': 'Text'}, {'h1': 'Another'}]}}}
# JSON -> HTML (jsoup)
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder)
print(soup.prettify())
# <html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>
The children key preserves element order, including elements with attributes:
json = {
"table": {
"attrs": {"id": "data"},
"children": [
{"tr": {"children": [{"th": "Name"}, {"th": "Score"}]}},
{"tr": {"children": [{"td": "Alice"}, {"td": "95"}]}}
]
}
}
Options
Using install() for Cleaner Syntax
Register jsoup so you can use "jsoup" as a parser string:
from jsoup import install
install()
from bs4 import BeautifulSoup
soup = BeautifulSoup({"p": "hello"}, "jsoup")
Custom Label Names
Override the default key names for attributes, text, and children:
json = {"p": {"@": {"class": "x"}, "#text": "hello"}}
soup = BeautifulSoup(json, builder=JsonTreeBuilder,
attr_name='@', text_name='#text')
# <p class="x">hello</p>
Duplicate Attributes
Control how duplicate attribute keys are handled when attrs is a list of dicts:
json = {"p": {"attrs": [{"class": "a"}, {"class": "b"}], "text": "hello"}}
# Replace (default): last value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="replace")
# Ignore: first value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="ignore")
# Callable: custom merge logic
def merge(attrs, name, value):
attrs[name] += " " + value
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute=merge)
Character References
HTML entities are escaped automatically:
json = {"p": "1<2 && 2>1"}
soup = BeautifulSoup(json, builder=JsonTreeBuilder)
# <p>1<2 && 2>1</p>
API Reference
JsonTreeBuilder
A BeautifulSoup TreeBuilder that accepts JSON dicts as input.
from jsoup import JsonTreeBuilder
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder, **options)
Options (passed as kwargs to BeautifulSoup):
| Option | Default | Description |
|---|---|---|
attr_name |
"attrs" |
JSON key for element attributes |
text_name |
"text" |
JSON key for text content |
children_name |
"children" |
JSON key for ordered children list |
on_duplicate_attribute |
"replace" |
How to handle duplicate attrs: "replace", "ignore", or callable |
convert_charref |
True |
Whether to escape HTML entities |
install()
Register JsonTreeBuilder so "jsoup" can be used as a parser string:
from jsoup import install
install(debug=False)
After calling install():
soup = BeautifulSoup(json_data, "jsoup")
Contributing
See CONTRIBUTING.md for development setup, versioning guide, and how to submit changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsoup-0.1.0.tar.gz.
File metadata
- Download URL: jsoup-0.1.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49f387b489eaafe3166c4903ac65ab737db9938fd5ecaedafac7b390c0aa8c9c
|
|
| MD5 |
37c44fa325ce0559d565008632bcbbbf
|
|
| BLAKE2b-256 |
755db8bf81fb7d2a4ec3d1170355e51bd2c2af332a3da4f4b963a9d0d4577047
|
File details
Details for the file jsoup-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jsoup-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43d6b809b37ddc41ab478505958da70382b2e93466ae4fe211b7bbb93be1c2f9
|
|
| MD5 |
e7d9f0f933ccca43e07edc75a977344d
|
|
| BLAKE2b-256 |
9aa2a2edbc846f3346c7f2aff689288783965ccbc929e243985e5768d85cb6ef
|