Skip to main content

Extracts all text results from an XPath query on a parsel Selector object.

Project description

parsel_get_selector_text

parsel_get_selector_text is a Python library designed to simplify the extraction of text data from HTML or XML documents using XPath queries on parsel Selector objects. It provides a straightforward interface to obtain and optionally fix mojibake (garbled text due to encoding issues).

Installation

To install parsel_get_selector_text, use pip:

pip install parsel_get_selector_text

Usage

Function: parsel_sel_get_text

This is the main function of the library, designed to extract all text results from an XPath query on a parsel Selector object.

Parameters

  • parsel_sel (parsel.Selector): The parsel Selector object from which to extract text.
  • xpath (str): The XPath query string to specify the text extraction path.
  • fix_mojibake (bool, optional): A flag to indicate whether to fix mojibake issues in the extracted text. Default is True.

Returns

  • str: A string containing the concatenated text results from the specified XPath query.

Example

Here's a simple example of how to use the parsel_sel_get_text function:

from parsel import Selector
from parsel_get_selector_text import parsel_sel_get_text

html_content = """
<html>
  <body>
    <div id="content">
      <p>Hello, world!</p>
      <p>Welcome to the parsel_get_selector_text library.</p>
    </div>
  </body>
</html>
"""

# Create a parsel Selector object
selector = Selector(text=html_content)

# Define the XPath query
xpath_query = "//div[@id='content']/p//text()"

# Extract text using the parsel_sel_get_text function
extracted_text = parsel_sel_get_text(parsel_sel=selector, xpath=xpath_query)

print(extracted_text)

Output

Hello, world!
Welcome to the parsel_get_selector_text library.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsel_get_selector_text-0.8.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

parsel_get_selector_text-0.8-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file parsel_get_selector_text-0.8.tar.gz.

File metadata

File hashes

Hashes for parsel_get_selector_text-0.8.tar.gz
Algorithm Hash digest
SHA256 9212a50197a841c0bff5ece1a30ff8a33f78981dae57f8352cda9a6700b43ad1
MD5 5e194d1b884593fd068bd4d29754f939
BLAKE2b-256 693a027176495d31528bac69532e3ddc3f720425387e1e125eb512da8ee629b5

See more details on using hashes here.

File details

Details for the file parsel_get_selector_text-0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for parsel_get_selector_text-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 fbb21a459aa95b7f16ff59ec2d9937c6c45c5c7c84aa31d58f1e4948d15002b7
MD5 d223658decd30952d16021cd7c12d6cb
BLAKE2b-256 1abc0c3dd84551275f2f90b1b8cb58a2d2813cecd160f3b8a4340846b8d3cb31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page