functions that work on soup

Project description

ow

functions that work on soup

To install: pip install ow

Overview

The ow package provides a collection of utilities designed to facilitate the manipulation and querying of HTML/XML structures using BeautifulSoup. It includes functions to navigate through the structure, extract information, and even open HTML content in a web browser for debugging or inspection purposes.

Features

Navigational Utilities: Traverse through the HTML tree structure to find parent elements or specific paths.
HTML Content Handling: Save HTML tags to a file and view them in Firefox, aiding in debugging and visualization.
Data Extraction: Simplify the extraction of text from specified tags and automatically apply text transformations.
Batch Element Retrieval: Retrieve multiple elements based on complex path specifications, supporting both simple and nested queries.

Installation

Install the package using pip:

pip install ow

Usage Examples

Finding the Root Parent of a Tag

To find the root parent of a BeautifulSoup tag:

from bs4 import BeautifulSoup
from ow import root_parent

soup = BeautifulSoup("<div><span>Example</span></div>", "html.parser")
span_tag = soup.find('span')
root = root_parent(span_tag)
print(root)  # Outputs the div tag

Open a Tag in Firefox

To open a tag's HTML content in Firefox for debugging:

from bs4 import BeautifulSoup
from ow import open_tag_in_firefox

soup = BeautifulSoup('<div><span>Open me in Firefox</span></div>', 'html.parser')
span_tag = soup.find('span')
open_tag_in_firefox(span_tag)

Adding Text to a Parse Dictionary

Extract text from a specified tag and add it to a dictionary, optionally applying a text transformation:

from bs4 import BeautifulSoup
from ow import add_text_to_parse_dict

soup = BeautifulSoup('<div><p id="para"> Some text </p></div>', 'html.parser')
parse_dict = {}
add_text_to_parse_dict(soup, parse_dict, key='paragraph', name='p', attrs={'id': 'para'}, text_transform=str.strip)
print(parse_dict)  # Outputs: {'paragraph': 'Some text'}

Getting Elements by Path

Retrieve elements from a BeautifulSoup object by specifying a path:

from bs4 import BeautifulSoup
from ow import get_elements

soup = BeautifulSoup('<div><p>First</p><p>Second</p></div>', 'html.parser')
elements = get_elements(soup, ['p'])
print([e.text for e in elements])  # Outputs: ['First', 'Second']

Function Documentation

`root_parent(s)`

Returns the furthest ancestor of a BeautifulSoup tag.

`open_tag_in_firefox(tag)`

Saves the HTML of a BeautifulSoup tag to a temporary file and opens it in Firefox.

`add_text_to_parse_dict(soup, parse_dict, key, name, attrs, text_transform)`

Finds a tag in the given BeautifulSoup object soup by name and attrs, extracts its text, applies a text_transform function, and adds it to parse_dict under key.

`get_element(node, path_to_element)`

Retrieves an element from a BeautifulSoup node by following a specified path. The path can be a string, list, or dictionary describing how to find the element.

`get_elements(nodes, path_to_element)`

Recursively retrieves elements from a node or list of nodes in a BeautifulSoup object by following a list of paths. Each path can be a string, list, or dictionary that specifies how to find the elements.

Contributing

Contributions to the ow package are welcome. Please ensure that any pull requests or issues are detailed with examples and expected outcomes.

Project details

Release history Release notifications | RSS feed

This version

0.0.6

Jun 16, 2025

0.0.5

May 19, 2025

0.0.4

Oct 10, 2022

0.0.3

Oct 3, 2022

0.0.2

Jan 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ow-0.0.6.tar.gz (7.9 kB view details)

Uploaded Jun 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ow-0.0.6-py3-none-any.whl (8.1 kB view details)

Uploaded Jun 16, 2025 Python 3

File details

Details for the file ow-0.0.6.tar.gz.

File metadata

Download URL: ow-0.0.6.tar.gz
Upload date: Jun 16, 2025
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ow-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`4d642d0d7b021602071d462fd674ad57537a5cbca94af4350bf968bb1bd2b8f8`
MD5	`b74880b14956b6c1ea1bb0ad8015ed0a`
BLAKE2b-256	`5bc4a189d64843328ccd6eb287e300c8799f6c7b495d703ab8827d0485e01929`

See more details on using hashes here.

File details

Details for the file ow-0.0.6-py3-none-any.whl.

File metadata

Download URL: ow-0.0.6-py3-none-any.whl
Upload date: Jun 16, 2025
Size: 8.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ow-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`446af67f6b78e0c94874b94cd3a213fc4d568e5a3ef686b0b27ae1d914ef06ee`
MD5	`82bc039bb1d1ee3db04f3663ed815e45`
BLAKE2b-256	`2a2d95176fb38d8eba19c2333ec724ca3d07e2cfec0e7d04c0f0fc82b84e8927`

See more details on using hashes here.

ow 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ow

Overview

Features

Installation

Usage Examples

Finding the Root Parent of a Tag

Open a Tag in Firefox

Adding Text to a Parse Dictionary

Getting Elements by Path

Function Documentation

`root_parent(s)`

`open_tag_in_firefox(tag)`

`add_text_to_parse_dict(soup, parse_dict, key, name, attrs, text_transform)`

`get_element(node, path_to_element)`

`get_elements(nodes, path_to_element)`

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes