Extracts all text results from an XPath query on a parsel Selector object.
Project description
parsel_text
parsel_text is a Python library designed to simplify the extraction of text data from HTML or XML documents using XPath queries on parsel Selector objects. It provides a straightforward interface to obtain and optionally fix mojibake (garbled text due to encoding issues).
Installation
To install parsel_text, use pip:
pip install parsel_text
DeepWiki Docs: https://deepwiki.com/carlosplanchon/parsel_text
Usage
Function: get_xpath_text
This is the main function of the library, designed to extract all text results from an XPath query on a parsel Selector object.
Parameters
parsel_sel(parsel.Selector): TheparselSelector object from which to extract text.xpath(str): The XPath query string to specify the text extraction path.fix_mojibake(bool, optional): A flag to indicate whether to fix mojibake issues in the extracted text. Default isTrue.
Returns
str: A string containing the concatenated text results from the specified XPath query.
Example
Here's a simple example of how to use the parsel_sel_get_text function:
from parsel import Selector
from parsel_text import get_xpath_text
html_content = """
<html>
<body>
<div id="content">
<p>Hello, world!</p>
<p>Welcome to the parsel_text library.</p>
</div>
</body>
</html>
"""
# Create a parsel Selector object
selector = Selector(text=html_content)
# Define the XPath query
xpath_query = "//div[@id='content']/p//text()"
# Extract text using the parsel_sel_get_text function
extracted_text = get_xpath_text(parsel_sel=selector, xpath=xpath_query)
print(extracted_text)
Output
Hello, world!
Welcome to the parsel_text library.
Contributing
Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsel_text-1.2.tar.gz.
File metadata
- Download URL: parsel_text-1.2.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b773d563865bbd29eea3a6a80ae4028e84494b2e833d5dbb3ef0d31d3d19702
|
|
| MD5 |
949aa2564e7de72a55a14fba4ebbeecc
|
|
| BLAKE2b-256 |
b7e48cd4b707fd281ab236634c62548eb1931d677cf297711e6ec1b904c1e3c2
|
File details
Details for the file parsel_text-1.2-py3-none-any.whl.
File metadata
- Download URL: parsel_text-1.2-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ff474658fab0c3e8f3fb0eeb6be7d1d19edd20e8bddbaf9f1d521ddfe29ac18
|
|
| MD5 |
7e4075d91ece25bb67e0b00a2e4320c8
|
|
| BLAKE2b-256 |
e3c3a6e9ab975bdd55eb961b972efbe30816d280de631053d2907a6fd9118021
|