Skip to main content

Convert html to json.

Project description

HTML to JSON

PyPI Build Status codecov

Convert HTML and/or HTML tables to JSON.

Installation

pip install html-to-json

Usage

HTML to JSON

import html_to_json_enhanced

html_string = """<head>
    <title>Test site</title>
    <meta charset="UTF-8"></head>"""
output_json = html_to_json_enhanced.convert(html_string)
print(output_json)

When calling the html_to_json.convert function, you can choose to not capture the text values from the html by passing in the key-word argument capture_element_values=False. You can also choose to not capture the attributes of the elements by passing capture_element_attributes=False into the function.

Example

Example input:

<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>

Example output:

{
    "head": [
    {
        "title": [
        {
            "_value": "Floyd Hightower's Projects"
        }],
        "meta": [
        {
            "_attributes":
            {
                "charset": "UTF-8"
            }
        },
        {
            "_attributes":
            {
                "name": "description",
                "content": "Floyd Hightower's Projects"
            }
        },
        {
            "_attributes":
            {
                "name": "keywords",
                "content": "projects,fhightower,Floyd,Hightower"
            }
        }]
    }]
}

HTML Tables to JSON

In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.

Currently, this library can handle three types of tables:

A. Those with table headers in the first row B. Those with table headers in the first column C. Those without table headers

Tables of type A and B are diagrammed below:

This package can handle tables with the headers in the first row or headers in the first column

Example

This code:

import html_to_json_enhanced

html_string = """<table>
    <tr>
        <th>#</th>
        <th>Malware</th>
        <th>MD5</th>
        <th>Date Added</th>
    </tr>

    <tr>
        <td>25548</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
        <td>July 9, 2018, 6:25 a.m.</td>
    </tr>

    <tr>
        <td>25547</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
        <td>July 7, 2018, 6:25 a.m.</td>
    </tr></table>"""
tables = html_to_json_enhanced.convert_tables(html_string)
print(tables)

will produce this output:

[
    [
        {
            "#": "25548",
            "Malware": "DarkComet",
            "MD5": "034a37b2a2307f876adc9538986d7b86",
            "Date Added": "July 9, 2018, 6:25 a.m."
        }, {
            "#": "25547",
            "Malware": "DarkComet",
            "MD5": "706eeefbac3de4d58b27d964173999c3",
            "Date Added": "July 7, 2018, 6:25 a.m."
        }
    ]
]

Credits

This package was created with Cookiecutter and fhightower's Python project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html-to-json-enhanced-1.0.5.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

html_to_json_enhanced-1.0.5-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file html-to-json-enhanced-1.0.5.tar.gz.

File metadata

  • Download URL: html-to-json-enhanced-1.0.5.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for html-to-json-enhanced-1.0.5.tar.gz
Algorithm Hash digest
SHA256 3637dfdd6ae57977ce568a45777e4df970ac4482fdaec56466b66a04634ab662
MD5 437c4b56edfd79541370444261bb9bcf
BLAKE2b-256 2b1d0c17da17d0470d51b13b3e44c503485f627e6024db1d797589b54ce33e14

See more details on using hashes here.

File details

Details for the file html_to_json_enhanced-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for html_to_json_enhanced-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8cd761912e65521f7904f55e32bd5ca965d0c9ef7117ee17e12e5429051201ce
MD5 06bec8aba880d72acee4b79d290ce021
BLAKE2b-256 336b52748ec261141b784fdea34b45e7b0a1e94bd106e71b1e0ae6ce9d9ecc5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page