Convert bs4 Tags into Json

These details have not been verified by PyPI

Project links

Homepage

Project description

bs2json

A lightweight Python library that converts BeautifulSoup4 HTML elements into structured JSON. Parse any HTML and get clean, traversable dictionaries — preserving document order, with full control over comments, whitespace, and label naming.

Python 3.8+ | Only dependency: beautifulsoup4

Table of Contents

Section	Description
Installation	How to install
Quick Start	Basic usage example
Output Format	How HTML maps to JSON
Conversion	Converting tags, multiple tags, from BeautifulSoup
Options	group_by_tag, comments, whitespace, labels, config
Output	Save to file, pretty print
Advanced Usage	Context manager, callable, extension mode
API Reference	BS2Json methods, ConversionConfig fields
Contributing	How to contribute

Installation

pip install -U bs2json

Quick Start

from bs2json import BS2Json

html = """
<html>
<head><title>My Page</title></head>
<body>
    <h1>Welcome</h1>
    <p class="intro">Hello <b>world</b></p>
    <a href="/link1">Link 1</a>
    <a href="/link2">Link 2</a>
</body>
</html>
"""

converter = BS2Json(html)
result = converter.convert()
converter.prettify()

Output Format

Elements preserve their original document order. The JSON structure follows these rules:

HTML	JSON
`<h1>text</h1>`	`{"h1": "text"}`
`<p class="x">text</p>`	`{"p": {"attrs": {"class": ["x"]}, "text": "text"}}`
`<div><h1>A</h1><p>B</p></div>`	`{"div": {"children": [{"h1": "A"}, {"p": "B"}]}}`
`<a href="/">link</a>`	`{"a": {"attrs": {"href": "/"}, "text": "link"}}`
`<!-- note -->`	`{"comment": "<!-- note -->"}`

Single text child stays simple: {"tag": "text"}
Multiple children use: {"tag": {"children": [...]}}
Attributes appear under the "attrs" key
Mixed content (text + tags) preserves order in children

Full output example

{'html': {'head': {'title': 'My Page'},
          'body': {'children': [{'h1': 'Welcome'},
                                {'p': {'attrs': {'class': ['intro']},
                                       'children': [{'text': 'Hello'},
                                                    {'b': 'world'}]}},
                                {'a': {'attrs': {'href': '/link1'},
                                       'text': 'Link 1'}},
                                {'a': {'attrs': {'href': '/link2'},
                                       'text': 'Link 2'}}]}}}

Conversion

Convert Specific Tags

converter = BS2Json(html)

# By tag name
converter.convert('body')

# By CSS class
converter.convert(class_='intro')

# By attribute
converter.convert('a', href='/link1')
# {'a': {'attrs': {'href': '/link1'}, 'text': 'Link 1'}}

Convert Multiple Tags

converter = BS2Json(html)

# As a list of individual results
converter.convert_all('a')
# [{'a': {'attrs': {'href': '/link1'}, 'text': 'Link 1'}},
#  {'a': {'attrs': {'href': '/link2'}, 'text': 'Link 2'}}]

# Grouped by tag name into a single dict
converter.convert_all('a', join=True)
# [{'a': [{'attrs': {'href': '/link1'}, 'text': 'Link 1'},
#         {'attrs': {'href': '/link2'}, 'text': 'Link 2'}]}]

From BeautifulSoup Objects

You can pass an existing BeautifulSoup object or Tag instead of raw HTML:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

# From a soup object
BS2Json(soup).convert()

# From a specific tag
BS2Json(soup.find('body')).convert()

# Convert on-the-fly with no soup
converter = BS2Json()
converter.convert(soup.body)

Options

Group by Tag Name

By default, elements preserve document order. Use group_by_tag=True to group siblings by tag name — useful when you don't care about order and want quick access by tag:

html = '<html><body><h3>First</h3><p>Text</p><h3>Second</h3></body></html>'

# Default: preserves document order
BS2Json(html).convert()
# {'html': {'body': {'children': [{'h3': 'First'}, {'p': 'Text'}, {'h3': 'Second'}]}}}

# Grouped: siblings merged by tag name
BS2Json(html, group_by_tag=True).convert()
# {'html': {'body': {'h3': ['First', 'Second'], 'p': 'Text'}}}

Comments

comment_html = '<html><body><!-- TODO --><p>text</p></body></html>'

# Included by default
BS2Json(comment_html).convert()
# {'html': {'body': {'children': [{'comment': '<!-- TODO -->'}, {'p': 'text'}]}}}

# Exclude comments
BS2Json(comment_html, include_comments=False).convert()
# {'html': {'body': {'p': 'text'}}}

Whitespace

ws_html = '<html><body><p>  hello  </p></body></html>'

# Stripped by default
BS2Json(ws_html).convert()
# {'html': {'body': {'p': 'hello'}}}

# Preserve whitespace
BS2Json(ws_html, strip=False).convert()
# {'html': {'body': {'p': '  hello  '}}}

Custom Labels

Change the JSON key names for attributes, text content, and comments:

converter = BS2Json('<html><body><p class="x">hello</p></body></html>')
converter.labels(attrs='attributes', text='content', comment='notes')
result = converter.convert()
# {'html': {'body': {'p': {'attributes': {'class': ['x']}, 'content': 'hello'}}}}

Or via constructor:

BS2Json(html, attr_name='@', text_name='#text', comment_name='#comment')

Configuration Object

All options are stored in a ConversionConfig dataclass, accessible and modifiable at any time:

from bs2json import BS2Json, ConversionConfig

converter = BS2Json(html, strip=False)
print(converter.config)
# ConversionConfig(attr_name='attrs', text_name='text', comment_name='comment',
#                  include_comments=True, strip=False, group_by_tag=False)

# Modify config directly
converter.config.group_by_tag = True
converter.config.include_comments = False

Output

Save to File

converter = BS2Json(html)
converter.convert()

# Save to JSON file (pretty-printed, 4-space indent)
converter.save('output.json')

# Save compact
converter.save('compact.json', prettify=False)

# Custom indent
converter.save('indented.json', indent=2)

# Save to a file-like object
import io
buf = io.StringIO()
converter.save(buf)

Pretty Print

converter = BS2Json(html)
converter.convert()
converter.prettify()  # prints to stdout

Advanced Usage

Context Manager and Callable

# Use as context manager
with BS2Json(html) as converter:
    result = converter.convert()

# Use as callable (shortcut for .convert())
converter = BS2Json(html)
result = converter()

Extension Mode

Monkey-patch .to_json() directly onto every BeautifulSoup Tag element:

from bs4 import BeautifulSoup
from bs2json import install, remove

install()

soup = BeautifulSoup(html, 'html.parser')

# Now every tag has .to_json()
soup.find('body').to_json()
soup.find('a').to_json(include_comments=False, strip=False)

remove()  # clean up when done

API Reference

BS2Json

Method	Description
`BS2Json(soup, features, , include_comments, strip, group_by_tag, *kwargs)`	Initialize from HTML string, Tag, or BeautifulSoup object
`.convert(element=None, json=None, , inplace=False, *kwargs)`	Convert a single tag to a dict
`.convert_all(elements=None, lst=None, , join=False, *kwargs)`	Convert multiple tags to a list of dicts
`.labels(attrs=..., text=..., comment=...)`	Change JSON key names
`.save(file, /, mode='w', *, prettify=True, indent=4)`	Save last result to file path or file object
`.prettify()`	Pretty-print last result to stdout
`.config`	`ConversionConfig` dataclass with all options
`.last_obj`	Result of the most recent conversion
`.soup`	The underlying BeautifulSoup object

ConversionConfig

Field	Default	Description
`attr_name`	`"attrs"`	JSON key for element attributes
`text_name`	`"text"`	JSON key for text content
`comment_name`	`"comment"`	JSON key for HTML comments
`include_comments`	`True`	Whether to include HTML comments
`strip`	`True`	Strip leading/trailing whitespace from text
`group_by_tag`	`False`	Group siblings by tag name instead of preserving order

Contributing

See CONTRIBUTING.md for development setup, versioning guide, and how to submit changes.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Mar 17, 2026

0.2.0

Mar 17, 2026

0.1.2

Feb 13, 2023

0.0.3

Jan 2, 2022

0.0.1

Mar 7, 2020

0.0.0.2

Oct 11, 2019

0.0.0.1

Oct 11, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bs2json-0.3.0.tar.gz (14.4 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bs2json-0.3.0-py3-none-any.whl (10.3 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file bs2json-0.3.0.tar.gz.

File metadata

Download URL: bs2json-0.3.0.tar.gz
Upload date: Mar 17, 2026
Size: 14.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bs2json-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ec789741f8ef836e07a0881e713aa8f02384f6912ef6f9d53f35b6cfd64e8b67`
MD5	`d834d0ae752f0fbe20aee04cd9eeb358`
BLAKE2b-256	`1db10373ceda8ff1488fe1c73bf49a84554f33ff256cb16dc975c391b79b4f8f`

See more details on using hashes here.

File details

Details for the file bs2json-0.3.0-py3-none-any.whl.

File metadata

Download URL: bs2json-0.3.0-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 10.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bs2json-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02c219c7fe7d2ac4b81486fed01c60c51942b6bf7c9c38a768e12f64db142b42`
MD5	`57fcef97ba0c8ae6fdea8e7110207c32`
BLAKE2b-256	`cc3d197ae883f866e6c170e475025a9ebd05a6008ce3b8b414e2cb262fcebe95`

See more details on using hashes here.

bs2json 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bs2json

Installation

Quick Start

Output Format

Conversion

Options

Output

Advanced Usage

API Reference

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes