Skip to main content

Extracts all text results from an XPath query on a parsel Selector object.

Project description

parsel_text

parsel_text is a Python library designed to simplify the extraction of text data from HTML or XML documents using XPath queries on parsel Selector objects. It provides a straightforward interface to obtain and optionally fix mojibake (garbled text due to encoding issues).

Installation

To install parsel_text, use pip:

pip install parsel_text

Usage

Function: parsel_sel_get_text

This is the main function of the library, designed to extract all text results from an XPath query on a parsel Selector object.

Parameters

  • parsel_sel (parsel.Selector): The parsel Selector object from which to extract text.
  • xpath (str): The XPath query string to specify the text extraction path.
  • fix_mojibake (bool, optional): A flag to indicate whether to fix mojibake issues in the extracted text. Default is True.

Returns

  • str: A string containing the concatenated text results from the specified XPath query.

Example

Here's a simple example of how to use the parsel_sel_get_text function:

from parsel import Selector
from parsel_text import parsel_sel_get_text

html_content = """
<html>
  <body>
    <div id="content">
      <p>Hello, world!</p>
      <p>Welcome to the parsel_text library.</p>
    </div>
  </body>
</html>
"""

# Create a parsel Selector object
selector = Selector(text=html_content)

# Define the XPath query
xpath_query = "//div[@id='content']/p//text()"

# Extract text using the parsel_sel_get_text function
extracted_text = parsel_sel_get_text(parsel_sel=selector, xpath=xpath_query)

print(extracted_text)

Output

Hello, world!
Welcome to the parsel_text library.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsel_text-1.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

parsel_text-1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file parsel_text-1.0.tar.gz.

File metadata

  • Download URL: parsel_text-1.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for parsel_text-1.0.tar.gz
Algorithm Hash digest
SHA256 fd33513729fc9c40fffaf4638bbb5c38ce151c40086d7319b66393743d90adf6
MD5 c8c86bca57c143d1c26b5517174dbcc6
BLAKE2b-256 0cae54757d132c1509b24912f7eb3224618dc7293d6f42912bf385d5fab62f0a

See more details on using hashes here.

File details

Details for the file parsel_text-1.0-py3-none-any.whl.

File metadata

  • Download URL: parsel_text-1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for parsel_text-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ae5f256f3e854a37c83892cd4a3de941ceda5740f3a79559fd260eef610a4d0e
MD5 1192d629d806eb8876751c1597e904c2
BLAKE2b-256 a5cba63f70282d2d3b8682a0a6c30a593f7cc39f662ee5fd2a09716559950ee7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page