Skip to main content

Convert JSON to BeautifulSoup HTML/XML structures

Project description

PyPI version PyPI downloads PyPI pyversions PyPI license GitHub stars GitHub issues GitHub last commit

jsoup

A Python library that converts JSON structures into BeautifulSoup HTML/XML trees. The inverse of bs2json — build HTML from dictionaries with full support for attributes, comments, doctypes, and nested elements.

Python 3.8+ | Only dependency: beautifulsoup4


Table of Contents
Section Description
Installation How to install
Quick Start Basic usage
Input Format How JSON maps to HTML
Features Attributes, lists, comments, empty elements, doctypes
bs2json Roundtrip Using bs2json output as jsoup input
Options Custom labels, duplicate attributes, char refs
API Reference JsonTreeBuilder, install()
Contributing How to contribute

Installation

pip install -U jsoup

Quick Start

from jsoup import JsonTreeBuilder
from bs4 import BeautifulSoup

json = {
    "body": {
        "h1": {"attrs": {"class": "title"}, "text": "Hello World"},
        "p": "This is a paragraph.",
        "br": None,
        "ul": {
            "li": ["Item 1", "Item 2", "Item 3"]
        }
    }
}

soup = BeautifulSoup(json, builder=JsonTreeBuilder)
print(soup.prettify())

Output:

<body>
 <h1 class="title">
  Hello World
 </h1>
 <p>
  This is a paragraph.
 </p>
 <br/>
 <ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
 </ul>
</body>

Input Format

JSON HTML
{"p": "text"} <p>text</p>
{"br": None} <br/>
{"p": {"attrs": {"class": "x"}, "text": "hello"}} <p class="x">hello</p>
{"li": ["a", "b", "c"]} <li>a</li><li>b</li><li>c</li>
{"comment": "note"} <!--note-->
{"doctype": "html"} <!DOCTYPE html>
{"div": {"children": [{"p": "a"}, {"p": "b"}]}} <div><p>a</p><p>b</p></div>

Features

Attributes

Attributes are passed via the attrs key:

json = {
    "a": {"attrs": {"href": "/home", "class": "nav"}, "text": "Home"},
    "img": {"attrs": {"src": "photo.jpg", "alt": "Photo"}}
}

Produces:

<a class="nav" href="/home">Home</a>
<img alt="Photo" src="photo.jpg"/>
Lists (Multiple Same Tags)

A list value creates multiple tags with the same name:

json = {"ul": {"li": ["Apple", "Banana", "Cherry"]}}

Produces:

<ul><li>Apple</li><li>Banana</li><li>Cherry</li></ul>

List items can also be dicts with nested content:

json = {"ul": {"li": [
    "Simple item",
    {"text": "Item with link", "a": {"attrs": {"href": "/"}, "text": "click"}}
]}}
Comments
json = {
    "body": {
        "comment": "This is a comment",
        "p": "Visible text"
    }
}
# Produces: <!--This is a comment--><p>Visible text</p>
Empty Elements

Use None for self-closing tags:

json = {"body": {"br": None, "hr": None}}
# Produces: <body><br/><hr/></body>
Doctypes
json = {
    "doctype": "html",
    "html": {"body": {"p": "content"}}
}
Nested Structures

Nesting works naturally:

json = {
    "html": {
        "head": {"title": "My Page"},
        "body": {
            "header": {
                "nav": {"ul": {"li": [
                    {"a": {"attrs": {"href": "/"}, "text": "Home"}},
                    {"a": {"attrs": {"href": "/about"}, "text": "About"}}
                ]}}
            },
            "main": {"h1": "Welcome", "p": "Content here"},
            "footer": {"p": "Copyright 2026"}
        }
    }
}

bs2json Roundtrip

jsoup understands the children key from bs2json's ordered output, enabling roundtrip conversion:

from bs2json import BS2Json
from bs4 import BeautifulSoup
from jsoup import JsonTreeBuilder

# HTML -> JSON (bs2json)
html = "<html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>"
json_data = BS2Json(html).convert()
# {'html': {'body': {'children': [{'h1': 'Title'}, {'p': 'Text'}, {'h1': 'Another'}]}}}

# JSON -> HTML (jsoup)
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder)
print(soup.prettify())
# <html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>

The children key preserves element order, including elements with attributes:

json = {
    "table": {
        "attrs": {"id": "data"},
        "children": [
            {"tr": {"children": [{"th": "Name"}, {"th": "Score"}]}},
            {"tr": {"children": [{"td": "Alice"}, {"td": "95"}]}}
        ]
    }
}

Options

Using install() for Cleaner Syntax

Register jsoup so you can use "jsoup" as a parser string:

from jsoup import install
install()

from bs4 import BeautifulSoup
soup = BeautifulSoup({"p": "hello"}, "jsoup")
Custom Label Names

Override the default key names for attributes, text, and children:

json = {"p": {"@": {"class": "x"}, "#text": "hello"}}
soup = BeautifulSoup(json, builder=JsonTreeBuilder,
                     attr_name='@', text_name='#text')
# <p class="x">hello</p>
Duplicate Attributes

Control how duplicate attribute keys are handled when attrs is a list of dicts:

json = {"p": {"attrs": [{"class": "a"}, {"class": "b"}], "text": "hello"}}

# Replace (default): last value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="replace")

# Ignore: first value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="ignore")

# Callable: custom merge logic
def merge(attrs, name, value):
    attrs[name] += " " + value

soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute=merge)
Character References

HTML entities are escaped automatically:

json = {"p": "1<2 && 2>1"}
soup = BeautifulSoup(json, builder=JsonTreeBuilder)
# <p>1&lt;2 &amp;&amp; 2&gt;1</p>

API Reference

JsonTreeBuilder

A BeautifulSoup TreeBuilder that accepts JSON dicts as input.

from jsoup import JsonTreeBuilder
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder, **options)

Options (passed as kwargs to BeautifulSoup):

Option Default Description
attr_name "attrs" JSON key for element attributes
text_name "text" JSON key for text content
children_name "children" JSON key for ordered children list
on_duplicate_attribute "replace" How to handle duplicate attrs: "replace", "ignore", or callable
convert_charref True Whether to escape HTML entities
install()

Register JsonTreeBuilder so "jsoup" can be used as a parser string:

from jsoup import install
install(debug=False)

After calling install():

soup = BeautifulSoup(json_data, "jsoup")

Contributing

See CONTRIBUTING.md for development setup, versioning guide, and how to submit changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsoup-0.1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsoup-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file jsoup-0.1.0.tar.gz.

File metadata

  • Download URL: jsoup-0.1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jsoup-0.1.0.tar.gz
Algorithm Hash digest
SHA256 49f387b489eaafe3166c4903ac65ab737db9938fd5ecaedafac7b390c0aa8c9c
MD5 37c44fa325ce0559d565008632bcbbbf
BLAKE2b-256 755db8bf81fb7d2a4ec3d1170355e51bd2c2af332a3da4f4b963a9d0d4577047

See more details on using hashes here.

File details

Details for the file jsoup-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jsoup-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jsoup-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43d6b809b37ddc41ab478505958da70382b2e93466ae4fe211b7bbb93be1c2f9
MD5 e7d9f0f933ccca43e07edc75a977344d
BLAKE2b-256 9aa2a2edbc846f3346c7f2aff689288783965ccbc929e243985e5768d85cb6ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page