functions that work on soup
Project description
ow
functions that work on soup
To install: pip install ow
Overview
The ow package provides a collection of utilities designed to facilitate the manipulation and querying of HTML/XML structures using BeautifulSoup. It includes functions to navigate through the structure, extract information, and even open HTML content in a web browser for debugging or inspection purposes.
Features
- Navigational Utilities: Traverse through the HTML tree structure to find parent elements or specific paths.
- HTML Content Handling: Save HTML tags to a file and view them in Firefox, aiding in debugging and visualization.
- Data Extraction: Simplify the extraction of text from specified tags and automatically apply text transformations.
- Batch Element Retrieval: Retrieve multiple elements based on complex path specifications, supporting both simple and nested queries.
Installation
Install the package using pip:
pip install ow
Usage Examples
Finding the Root Parent of a Tag
To find the root parent of a BeautifulSoup tag:
from bs4 import BeautifulSoup
from ow import root_parent
soup = BeautifulSoup("<div><span>Example</span></div>", "html.parser")
span_tag = soup.find('span')
root = root_parent(span_tag)
print(root) # Outputs the div tag
Open a Tag in Firefox
To open a tag's HTML content in Firefox for debugging:
from bs4 import BeautifulSoup
from ow import open_tag_in_firefox
soup = BeautifulSoup('<div><span>Open me in Firefox</span></div>', 'html.parser')
span_tag = soup.find('span')
open_tag_in_firefox(span_tag)
Adding Text to a Parse Dictionary
Extract text from a specified tag and add it to a dictionary, optionally applying a text transformation:
from bs4 import BeautifulSoup
from ow import add_text_to_parse_dict
soup = BeautifulSoup('<div><p id="para"> Some text </p></div>', 'html.parser')
parse_dict = {}
add_text_to_parse_dict(soup, parse_dict, key='paragraph', name='p', attrs={'id': 'para'}, text_transform=str.strip)
print(parse_dict) # Outputs: {'paragraph': 'Some text'}
Getting Elements by Path
Retrieve elements from a BeautifulSoup object by specifying a path:
from bs4 import BeautifulSoup
from ow import get_elements
soup = BeautifulSoup('<div><p>First</p><p>Second</p></div>', 'html.parser')
elements = get_elements(soup, ['p'])
print([e.text for e in elements]) # Outputs: ['First', 'Second']
Function Documentation
root_parent(s)
Returns the furthest ancestor of a BeautifulSoup tag.
open_tag_in_firefox(tag)
Saves the HTML of a BeautifulSoup tag to a temporary file and opens it in Firefox.
add_text_to_parse_dict(soup, parse_dict, key, name, attrs, text_transform)
Finds a tag in the given BeautifulSoup object soup by name and attrs, extracts its text, applies a text_transform function, and adds it to parse_dict under key.
get_element(node, path_to_element)
Retrieves an element from a BeautifulSoup node by following a specified path. The path can be a string, list, or dictionary describing how to find the element.
get_elements(nodes, path_to_element)
Recursively retrieves elements from a node or list of nodes in a BeautifulSoup object by following a list of paths. Each path can be a string, list, or dictionary that specifies how to find the elements.
Contributing
Contributions to the ow package are welcome. Please ensure that any pull requests or issues are detailed with examples and expected outcomes.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ow-0.0.6.tar.gz.
File metadata
- Download URL: ow-0.0.6.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d642d0d7b021602071d462fd674ad57537a5cbca94af4350bf968bb1bd2b8f8
|
|
| MD5 |
b74880b14956b6c1ea1bb0ad8015ed0a
|
|
| BLAKE2b-256 |
5bc4a189d64843328ccd6eb287e300c8799f6c7b495d703ab8827d0485e01929
|
File details
Details for the file ow-0.0.6-py3-none-any.whl.
File metadata
- Download URL: ow-0.0.6-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
446af67f6b78e0c94874b94cd3a213fc4d568e5a3ef686b0b27ae1d914ef06ee
|
|
| MD5 |
82bc039bb1d1ee3db04f3663ed815e45
|
|
| BLAKE2b-256 |
2a2d95176fb38d8eba19c2333ec724ca3d07e2cfec0e7d04c0f0fc82b84e8927
|