Skip to main content

Convert html to json.

Project description

HTML to JSON

PyPI Build Status codecov

Convert HTML and/or HTML tables to JSON.

Installation

pip install html-to-json

Usage

HTML to JSON

import html_to_json

html_string = """<head>
    <title>Test site</title>
    <meta charset="UTF-8"></head>"""
output_json = html_to_json.convert(html_string)
print(output_json)

When calling the html_to_json.convert function, you can choose to not capture the text values from the html by passing in the key-word argument capture_element_values=False. You can also choose to not capture the attributes of the elements by passing capture_element_attributes=False into the function.

Example

Example input:

<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>

Example output:

{
    "head": [
    {
        "title": [
        {
            "_value": "Floyd Hightower\'s Projects"
        }],
        "meta": [
        {
            "_attributes":
            {
                "charset": "UTF-8"
            }
        },
        {
            "_attributes":
            {
                "name": "description",
                "content": "Floyd Hightower\'s Projects"
            }
        },
        {
            "_attributes":
            {
                "name": "keywords",
                "content": "projects,fhightower,Floyd,Hightower"
            }
        }]
    }]
}

HTML Tables to JSON

import html_to_json

html_string = """<table class="table table-striped table-bordered table-hover">
    <tr>
        <th>#</th>
        <th>Malware</th>
        <th>MD5</th>
        <th>Date Added</th>
    </tr>

    <tr>
        <td>25548</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
        <td>July 9, 2018, 6:25 a.m.</td>
    </tr>

    <tr>
        <td>25547</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
        <td>July 7, 2018, 6:25 a.m.</td>
    </tr></table>"""
tables = html_to_json.convert_tables(html_string)
print(tables)

Currently, this package can handle tables with the headers in the first row or tables with headers in the first column as depicted below:

This package can handle tables with the headers in the first row or headers in the first column

Example

Example input:

<table class="table table-striped table-bordered table-hover">
    <tr>
        <th>#</th>
        <th>Malware</th>
        <th>MD5</th>
        <th>Date Added</th>
    </tr>

    <tr>
        <td>25548</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
        <td>July 9, 2018, 6:25 a.m.</td>
    </tr>

    <tr>
        <td>25547</td>
        <td><a href="/stats/DarkComet/">DarkComet</a></td>
        <td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
        <td>July 7, 2018, 6:25 a.m.</td>
    </tr>
</table>

Example output:

[
    [
        {
            '#': '25548',
            'Malware': 'DarkComet',
            'MD5': '034a37b2a2307f876adc9538986d7b86',
            'Date Added': 'July 9, 2018, 6:25 a.m.'
        }, {
            '#': '25547',
            'Malware': 'DarkComet',
            'MD5': '706eeefbac3de4d58b27d964173999c3',
            'Date Added': 'July 7, 2018, 6:25 a.m.'
        }
    ]
]

Credits

This package was created with Cookiecutter and fhightower's Python project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_to_json-1.0.8.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

html_to_json-1.0.8-py2.py3-none-any.whl (6.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file html_to_json-1.0.8.tar.gz.

File metadata

  • Download URL: html_to_json-1.0.8.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for html_to_json-1.0.8.tar.gz
Algorithm Hash digest
SHA256 48ac6e4b0383587f3b864edad4de4e10356fd8f10e5d4b09150c63d020b0e647
MD5 71e3df71e56aa17bd3ccfc0a2bc37be4
BLAKE2b-256 c54f6bbac32ba16c28335748f422d26c0c3fbc9b4f1044c2253a7859ac6b5642

See more details on using hashes here.

File details

Details for the file html_to_json-1.0.8-py2.py3-none-any.whl.

File metadata

  • Download URL: html_to_json-1.0.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1

File hashes

Hashes for html_to_json-1.0.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1f26311819e00628f2fa62f8d0c9822ad322d38b18fd3bfbd71000630714f91e
MD5 05c73ae9a6f00c50c318eead5b82e21d
BLAKE2b-256 8b51012219e533f7abeeb2f4ac4c508c9ce69a96a84b4ceae5ef5bceb427d9e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page