Skip to main content

bs4 to pd.DataFrame

Project description

bs4 to pd.DataFrame

Tested against Windows / Python 3.11 / Anaconda

pip install bs42frame

Parse HTML content and extract information using BeautifulSoup.

This function takes HTML content as input, parses it using BeautifulSoup, and extracts
information about the HTML structure, tag attributes, tag text, and the BeautifulSoup
object for each element found in the HTML.

Args:
	html (str, bytes, or file path): The HTML content to be parsed. It can be provided as
		a string, bytes, or a file path. If a file path is provided, the function will
		attempt to read the file.

Returns:
	pandas.DataFrame: A DataFrame containing the extracted information from the HTML.
		The DataFrame columns include 'aa_tag' (HTML tag name), 'aa_attrs' (list of tag
		attributes), 'aa_text' (text content of the tag), 'aa_soup' (BeautifulSoup object
		for the tag), 'aa_old_index' (original index of the tag), 'aa_key' (attribute
		key), and 'aa_value' (attribute value).

Example:
	from bs42frame import parse_html
	df = parse_html(
		html=r"C:\Users\hansc\Downloads\Your Repositories.mhtml"
	)
	#      aa_tag            aa_text                                          aa_soup  aa_old_index                     aa_key             aa_value
	# 1000   span  Import repository  [\r\n                Import repository\r\n\r\n]           274       ActionListItem-label                class
	# 1001     li                                                                  []           275               presentation                 role
	# 1002     li                                                                  []           275                       true          aria-hidden
	# 1003     li                                                                  []           275                       true  data-view-component
	# 1004     li                                                                  []           275  ActionList-sectionDivider                class

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bs42frame-0.10.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

bs42frame-0.10-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file bs42frame-0.10.tar.gz.

File metadata

  • Download URL: bs42frame-0.10.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for bs42frame-0.10.tar.gz
Algorithm Hash digest
SHA256 fd7e4c0fc2ba629c9c469b91a493804d2725f5f2551ed5aa590c5355863e1b15
MD5 ce753f03b2ac81b63215e91dcbcbcb80
BLAKE2b-256 c1c12df87bb9a9239f70f9903d7b869b87dc0cf8e3f69e169f226194cd61b19d

See more details on using hashes here.

File details

Details for the file bs42frame-0.10-py3-none-any.whl.

File metadata

  • Download URL: bs42frame-0.10-py3-none-any.whl
  • Upload date:
  • Size: 41.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for bs42frame-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 7c4234c1825976ac585ade207c760d4ceaf9a754204eae40c804aa9e6fe6311d
MD5 360f08f3f952a4295cf726df21555203
BLAKE2b-256 15fbdf54789707ec5e59a5dac17d5e45e5347ed016ab76ba3282b22e93adfbcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page