Skip to main content

Extracts all text results from an XPath query on a parsel Selector object.

Project description

parsel_text

parsel_text is a Python library designed to simplify the extraction of text data from HTML or XML documents using XPath queries on parsel Selector objects. It provides a straightforward interface to obtain and optionally fix mojibake (garbled text due to encoding issues).

Installation

To install parsel_text, use pip:

pip install parsel_text

DeepWiki Docs: https://deepwiki.com/carlosplanchon/parsel_text

Usage

Function: get_xpath_text

This is the main function of the library, designed to extract all text results from an XPath query on a parsel Selector object.

Parameters

  • parsel_sel (parsel.Selector): The parsel Selector object from which to extract text.
  • xpath (str): The XPath query string to specify the text extraction path.
  • fix_mojibake (bool, optional): A flag to indicate whether to fix mojibake issues in the extracted text. Default is True.

Returns

  • str: A string containing the concatenated text results from the specified XPath query.

Example

Here's a simple example of how to use the parsel_sel_get_text function:

from parsel import Selector
from parsel_text import get_xpath_text

html_content = """
<html>
  <body>
    <div id="content">
      <p>Hello, world!</p>
      <p>Welcome to the parsel_text library.</p>
    </div>
  </body>
</html>
"""

# Create a parsel Selector object
selector = Selector(text=html_content)

# Define the XPath query
xpath_query = "//div[@id='content']/p//text()"

# Extract text using the parsel_sel_get_text function
extracted_text = get_xpath_text(parsel_sel=selector, xpath=xpath_query)

print(extracted_text)

Output

Hello, world!
Welcome to the parsel_text library.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsel_text-1.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsel_text-1.2-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file parsel_text-1.2.tar.gz.

File metadata

  • Download URL: parsel_text-1.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for parsel_text-1.2.tar.gz
Algorithm Hash digest
SHA256 6b773d563865bbd29eea3a6a80ae4028e84494b2e833d5dbb3ef0d31d3d19702
MD5 949aa2564e7de72a55a14fba4ebbeecc
BLAKE2b-256 b7e48cd4b707fd281ab236634c62548eb1931d677cf297711e6ec1b904c1e3c2

See more details on using hashes here.

File details

Details for the file parsel_text-1.2-py3-none-any.whl.

File metadata

  • Download URL: parsel_text-1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for parsel_text-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ff474658fab0c3e8f3fb0eeb6be7d1d19edd20e8bddbaf9f1d521ddfe29ac18
MD5 7e4075d91ece25bb67e0b00a2e4320c8
BLAKE2b-256 e3c3a6e9ab975bdd55eb961b972efbe30816d280de631053d2907a6fd9118021

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page