Skip to main content

A simple wrapper for adding a simple way for storing parsed HTML content

Project description

HTML Serializer Parser

Based on html-to-json library https://pypi.org/project/html-to-json/ this library extends its functionality adding an additional layer for extra information like: query selector for every node, list of all query selectors, different return options, by list, by tree dictionary and/or by dict, if adds an specific property for every node

How to run it

import json
from html2json.parser import ParserOptions, html2json


if __name__ == "__main__":
    # You can use an HTML file, raw HTML String or and endpoint
    FILE_DIR = "./PATH_TO_YOUR_FILES/index.html"
    output = html2json(
        input_path=open(FILE_DIR).read(),
        options=ParserOptions.parser_factory(
            store_as_list=True,
            store_as_dict=True,
            store_as_tree_dict=True
        ),
        raw_content=False,
    )
    # Will retrieve a dict with follwing keys
    # as_list, as_dict, as_tree_dict, query_selectors
    json_output = json.dumps(output, ensure_ascii=True, indent=2)
    with open("data.json", "w") as o:
        o.write(json_output)

TODOS

  • Improve Readme
  • Add docstrings
  • Include more tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bs4_html2json-0.0.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

bs4_html2json-0.0.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file bs4_html2json-0.0.2.tar.gz.

File metadata

  • Download URL: bs4_html2json-0.0.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for bs4_html2json-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b529536bc4a472f428c1423abceb0988576d04c4e690b8a99095695e5bf6934a
MD5 9871f254acfba6ca21f947a0568792e4
BLAKE2b-256 326f7e60c3bcb090f885cda58bf8bf535e7462db57f3e2aaea909d3aae38fb98

See more details on using hashes here.

File details

Details for the file bs4_html2json-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for bs4_html2json-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 866326c4421a906a85258d6a8f1272f37d3ac917d112b99f3a63dee7f783d8e6
MD5 8f15d038025472f2f26438a8fd638100
BLAKE2b-256 880d198a82e229c4df14e07791155135b9eed3b06aceac40ef4be90eedeec02d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page