Convert html to json.
Project description
HTML to JSON
Convert HTML and/or HTML tables to JSON.
Installation
pip install html-to-json
Usage
HTML to JSON
import html_to_json
html_string = """<head>
<title>Test site</title>
<meta charset="UTF-8"></head>"""
output_json = html_to_json.convert(html_string)
print(output_json)
When calling the html_to_json.convert
function, you can choose to not capture the text values from the html by passing in the key-word argument capture_element_values=False
. You can also choose to not capture the attributes of the elements by passing capture_element_attributes=False
into the function.
Example
Example input:
<head>
<title>Floyd Hightower's Projects</title>
<meta charset="UTF-8">
<meta name="description" content="Floyd Hightower's Projects">
<meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
Example output:
{
"head": [
{
"title": [
{
"_value": "Floyd Hightower's Projects"
}],
"meta": [
{
"_attributes":
{
"charset": "UTF-8"
}
},
{
"_attributes":
{
"name": "description",
"content": "Floyd Hightower's Projects"
}
},
{
"_attributes":
{
"name": "keywords",
"content": "projects,fhightower,Floyd,Hightower"
}
}]
}]
}
HTML Tables to JSON
In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.
Currently, this library can handle three types of tables:
A. Those with table headers in the first row B. Those with table headers in the first column C. Those without table headers
Tables of type A and B are diagrammed below:
Example
This code:
import html_to_json
html_string = """<table>
<tr>
<th>#</th>
<th>Malware</th>
<th>MD5</th>
<th>Date Added</th>
</tr>
<tr>
<td>25548</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/034a37b2a2307f876adc9538986d7b86">034a37b2a2307f876adc9538986d7b86</a></td>
<td>July 9, 2018, 6:25 a.m.</td>
</tr>
<tr>
<td>25547</td>
<td><a href="/stats/DarkComet/">DarkComet</a></td>
<td><a href="/config/706eeefbac3de4d58b27d964173999c3">706eeefbac3de4d58b27d964173999c3</a></td>
<td>July 7, 2018, 6:25 a.m.</td>
</tr></table>"""
tables = html_to_json.convert_tables(html_string)
print(tables)
will produce this output:
[
[
{
"#": "25548",
"Malware": "DarkComet",
"MD5": "034a37b2a2307f876adc9538986d7b86",
"Date Added": "July 9, 2018, 6:25 a.m."
}, {
"#": "25547",
"Malware": "DarkComet",
"MD5": "706eeefbac3de4d58b27d964173999c3",
"Date Added": "July 7, 2018, 6:25 a.m."
}
]
]
Credits
This package was created with Cookiecutter and fhightower's Python project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file html_to_json-2.0.0.tar.gz
.
File metadata
- Download URL: html_to_json-2.0.0.tar.gz
- Upload date:
- Size: 54.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc848f40618f444f8e9971f88a22fef041d0cb4569464de018dcf8e3c37669e |
|
MD5 | 3435ba0c28a24aa9d273cc05799c91a7 |
|
BLAKE2b-256 | da83c425c27e4c8f4b622901f8b58ad48e53be14a080d341a70c67570f1ec30a |
File details
Details for the file html_to_json-2.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: html_to_json-2.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 707ba86390ac05cf59d36a106f3d3da34b6075a245ee597d4c6c06ca9a6d0898 |
|
MD5 | 730212b353bec354b16c5249a66704c1 |
|
BLAKE2b-256 | 5a79aa64abd13c010a02c3cc61f970295357fb0a65505eb096f7c03a2e7cdebd |