Skip to main content

Turns your beautifulsoup4 soup into python dictionary or json

Project description

soup2dict

BeautifulSoup4 to python dictionary converter


test codecov Python Version wemake-python-styleguide


Why

Its nice to have a convenient way to change your soup into dict.

Installation

Get package with pip or poetry

pip install soup2dict
poetry add soup2dict

Example

import simplejson
from bs4 import BeautifulSoup

from soup2dict import convert

html_doc = """
<html>
hei
<head>
    <title>The Dormouse's story</title>
    <title>bob</title>
</head>
<body>
    <p class="title">The <b>Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters;
    and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>

    <p class="story">...</p>
"""


# Create soup from html_doc data
soup = BeautifulSoup(html_doc, 'html.parser')

# Convert it to a dictionary with convert()
dict_result = convert(soup)

with open('output.json', 'w') as output_file:
    output_file.write(
        simplejson.dumps(dict_result, indent=2),
    )

Output

{
  "html": [
    {
      "#text": "hei The Dormouse's story bob The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well. ...",
      "navigablestring": [
        "hei"
      ],
      "head": [
        {
          "#text": "The Dormouse's story bob",
          "title": [
            {
              "#text": "The Dormouse's story",
              "navigablestring": [
                "The Dormouse's story"
              ]
            },
            {
              "#text": "bob",
              "navigablestring": [
                "bob"
              ]
            }
          ]
        }
      ],
      "body": [
        {
          "#text": "The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well. ...",
          "p": [
            {
              "@class": [
                "title"
              ],
              "#text": "The Dormouse's story",
              "navigablestring": [
                "The"
              ],
              "b": [
                {
                  "#text": "Dormouse's story",
                  "navigablestring": [
                    "Dormouse's story"
                  ]
                }
              ]
            },
            {
              "@class": [
                "story"
              ],
              "#text": "Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.",
              "navigablestring": [
                "Once upon a time there were three little sisters;\n    and their names were",
                ",",
                "and",
                ";\n    and they lived at the bottom of a well."
              ],
              "a": [
                {
                  "@href": "http://example.com/elsie",
                  "@class": [
                    "sister"
                  ],
                  "@id": "link1",
                  "#text": "Elsie",
                  "navigablestring": [
                    "Elsie"
                  ]
                },
                {
                  "@href": "http://example.com/lacie",
                  "@class": [
                    "sister"
                  ],
                  "@id": "link2",
                  "#text": "Lacie",
                  "navigablestring": [
                    "Lacie"
                  ]
                },
                {
                  "@href": "http://example.com/tillie",
                  "@class": [
                    "sister"
                  ],
                  "@id": "link3",
                  "#text": "Tillie",
                  "navigablestring": [
                    "Tillie"
                  ]
                }
              ]
            },
            {
              "@class": [
                "story"
              ],
              "#text": "...",
              "navigablestring": [
                "..."
              ]
            }
          ]
        }
      ]
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soup2dict-2.1.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

soup2dict-2.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file soup2dict-2.1.0.tar.gz.

File metadata

  • Download URL: soup2dict-2.1.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.7 Linux/5.4.0-86-generic

File hashes

Hashes for soup2dict-2.1.0.tar.gz
Algorithm Hash digest
SHA256 0819e5707a968f5922d65414846f2700fb69bf140ac99af304bb60f0eb02628d
MD5 792fd6d176633e052e07df0dc5ac3006
BLAKE2b-256 26fc7da2d1f9c27c78f1558c7e51ebe054e3570c8005a9e0464f0e7dea48a688

See more details on using hashes here.

File details

Details for the file soup2dict-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: soup2dict-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.7 Linux/5.4.0-86-generic

File hashes

Hashes for soup2dict-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34e1e56217224c14d0f7c595e6052f6aba4f1b0cc294c705a2506910d16c2c8d
MD5 685ce772a44cb2863ac056db220656c5
BLAKE2b-256 4465fe195c73bdc9f4b3aec0b8c30623dd9e2fcd2f471ba525e46c4bc6850dd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page