Skip to main content

A toolkit for convenient and expressive XPath operations based on lxml.

Project description

xpath-kit

PyPI Version License: MIT Python Versions

xpath-kit is a powerful Python library that provides a fluent, object-oriented, and Pythonic interface for building and executing XPath queries on top of lxml. It transforms complex, error-prone XPath string composition into a highly readable and maintainable chain of objects and methods.

Say goodbye to messy, hard-to-read XPath strings:

div[@id="main" and contains(@class, "content")]/ul/li[position()=1]

And say hello to a more intuitive and IDE-friendly way of writing queries:

E.div[(A.id == "main") & A.class_.contains("content")] / E.ul / E.li[1]


✨ Features

  • 🐍 Fluent & Pythonic Interface: Chain methods and operators (/, //, [], &, |, ==, >) to build complex XPath expressions naturally using familiar Python logic.
  • 💡 Smart Builders: Use E (elements), A (attributes), and F (functions) for a highly readable syntax with excellent IDE autocompletion support.
  • 📖 Superb Readability & Maintainability: Complex queries become self-documenting. It's easier to understand, debug, and modify your selectors.
  • 💪 Powerful Predicate Logic: Easily create sophisticated predicates for attributes, text, and functions. Gracefully handle multi-class selections with any(), all(), and none().
  • 🔩 Convenient DOM Manipulation: The result objects are powerful wrappers around lxml elements, allowing for easy DOM traversal and manipulation (e.g., append, remove, parent, next_sibling).
  • 🔒 Fully Type-Hinted: The entire library is fully type-hinted for an unmatched developer experience and static analysis with modern IDEs.
  • ⚙️ HTML & XML Support: Seamlessly parse both document types with html() and xml() entry points.

🚀 Installation

Install xpath-kit from PyPI using pip:

pip install xpath-kit

The library requires lxml as a dependency, which will be installed automatically.


🏁 Quick Start

Here's a simple example of how to use xpath-kit to parse a piece of HTML and extract information.

from xpathkit import html, E, A, F

html_content = """
<html>
  <body>
    <div id="main">
      <h2>Article Title</h2>
      <p>This is the first paragraph.</p>
      <ul class="item-list">
        <li class="item active">Item 1</li>
        <li class="item">Item 2</li>
        <li class="item disabled">Item 3</li>
      </ul>
    </div>
  </body>
</html>
"""

# 1. Parse the HTML content
root = html(html_content)

# 2. Build a query to find the <li> element with both "item" and "active" classes
# XPath: .//ul[contains(@class, "item-list")]/li[contains(@class, "item") and contains(@class, "active")]
query = E.ul[A.class_.contains("item-list")] / E.li[A.class_.all("item", "active")]

# 3. Execute the query and get a single element
active_item = root.descendant(query)

# Print its content and attributes
print(f"Tag: {active_item.tag}")
print(f"Text: {active_item.string()}")
print(f"Class attribute: {active_item['class']}")

# --- Output ---
# Tag: li
# Text: Item 1
# Class attribute: item active

# 4. Build a more complex query: find all <li> elements whose class does NOT contain 'disabled'
# XPath: .//li[not(contains(@class, "disabled"))]
query_enabled = E.li[F.not_(A.class_.contains("disabled"))]

# 5. Execute the query and process the list of results
enabled_items = root.descendants(query_enabled)
item_texts = enabled_items.map(lambda item: item.string())
print(f"\nEnabled items: {item_texts}")

# --- Output ---
# Enabled items: ['Item 1', 'Item 2']

📚 Core Concepts

1. Parsing Entrypoints

Use the html() or xml() functions to start. They accept a string, bytes, or a file path.

from xpathkit import html, xml

# Parse an HTML string
root_html = html("<div><p>Hello</p></div>")

# Parse an XML file
root_xml = xml(path="data.xml")

2. The Smart Builders (E, A, F)

These are the heart of xpath-kit, making expression building effortless.

  • E (Element): Builds element nodes. E.g., E.div, E.a, or custom tags E["my-tag"].
  • A (Attribute): Builds attribute nodes within predicates. E.g., A.id, A.href, or custom attributes A["data-id"].
  • F (Function): Builds XPath functions. E.g., F.contains(), F.not_(), F.position(), or any custom function: F["name"](arg1, ...).

Note: Since class and for are reserved keywords in Python, use a trailing underscore: A.class_ and A.for_.

3. Path Selection (/ and //)

Use the division operators to define relationships between elements.

  • /: Selects a direct child.
  • //: Selects a descendant at any level.
# Selects a <p> that is a direct child of a <div>
# XPath: div/p
query_child = E.div / E.p

# Selects an <a> that is a descendant of the <body>
# XPath: body//a
query_descendant = E.body // E.a

You can also use a string directly after an element for simple cases:

# Equivalent to E.div / E.span
query = E.div / "span"

This is convenient for simple queries without predicates or attributes.

4. Predicates ([])

Use square brackets [] on an element to add filtering conditions. This is where xpath-kit truly shines.

Attribute Predicates with A

# Find a div with id="main"
# XPath: //div[@id="main"]
query = E.div[A.id == "main"]

# Find an <a> that has an href attribute
# XPath: //a[@href]
query_has_href = E.a[A.href]

# Find an <li> whose class contains "item" but NOT "disabled"
# XPath: //li[contains(@class,"item") and not(contains(@class,"disabled"))]
query = E.li[A.class_.contains("item") & F.not_(A.class_.contains("disabled"))]

Text/Value Predicates

To query against the string value of a node (.), import the dot class.

from xpathkit import dot

# Find an <h1> whose text is exactly "Welcome"
# XPath: //h1[.="Welcome"]
query = E.h1[dot() == "Welcome"]

# Find a <p> whose text contains the word "paragraph"
# XPath: //p[contains(., "paragraph")]
query_contains = E.p[dot().contains("paragraph")]

Functional Predicates with F

Use F to call any standard XPath function inside a predicate.

# Select the first list item
# XPath: //li[position()=1]
query_first = E.li[F.position() == 1]

# Select the last list item
# XPath: //li[last()]
query_last = E.li[F.last()]

Combining Predicates with & and |

  • &: Logical and
  • |: Logical or
# Find an <a> with href="/home" AND a target attribute
# XPath: //a[@href="/home" and @target]
query_and = E.a[(A.href == "/home") & A.target]

# Find a <div> with id="sidebar" OR class="nav"
# XPath: //div[@id="sidebar" or contains(@class,"nav")]
query_or = E.div[(A.id == "sidebar") | A.class_.contains("nav")]

Important: Due to Python's operator precedence, it's highly recommended to wrap combined conditions in parentheses ().

Positional Predicates

Use integers (1-based) or negative integers (from the end) directly.

# Select the second <li>
# XPath: //li[2]
query = E.li[2]

# Select the last <li> (equivalent to F.last())
# XPath: //li[last()]
query_last = E.li[-1]

5. Working with Results

  • .child()/.descendant() return a single XPathElement.
  • .children()/.descendants() return an Union[XPathElementList, str, float, bool, List[str]].

XPathElement (Single Result)

  • .tag: The element's tag name (e.g., 'div').
  • .attr: A dictionary of all attributes.
  • element['name']: Access an attribute directly.
  • .string(): Get the concatenated text of the element and all its children (string(.)).
  • .text(): Get a list of only the element's direct text nodes (./text()).
  • .parent(): Get the parent element.
  • .next_sibling() / .prev_sibling(): Get adjacent sibling elements.
  • .xpath(query): Execute a raw string or a constructed query within the context of this element.

XPathElementList (Multiple Results)

  • .one(): Ensures the list contains exactly one element and returns it; otherwise, raises an error.
  • .first() / .last(): Get the first or last element; raises an error if the list is empty.
  • len(element_list): Get the number of elements.
  • .filter(func): Filter the list based on a function.
  • .map(func): Apply a function to each element and return a list of the results.
  • Can be iterated over directly: for e in my_list: ...
  • Supports slicing and indexing: my_list[0], my_list[-1]

6. DOM Manipulation

Modify the document tree with ease.

from xpathkit import XPathElement, E, A

# Assuming 'root' is a parsed XPathElement
# Find the <ul> element
ul = root.descendant(E.ul)

# Create and append a new <li>
new_li = XPathElement.create("li", attr={"class": "new-item"}, text="Item 4")
ul.append(new_li)

# Remove an element
item_to_remove = ul.child(E.li[A.class_.contains("disabled")])
if item_to_remove:
    ul.remove(item_to_remove)

# Print the modified HTML
print(root.tostring())

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpath_kit-0.3.11.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xpath_kit-0.3.11-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file xpath_kit-0.3.11.tar.gz.

File metadata

  • Download URL: xpath_kit-0.3.11.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for xpath_kit-0.3.11.tar.gz
Algorithm Hash digest
SHA256 717afbf3f44b627a6a61f04d83991d037d7a1693f688788094f347b815dbe9c3
MD5 1e50fabf8fcaa2b8c5cf46a78df93e96
BLAKE2b-256 9d443dd82488e430dfd12dfc6ba1260f39fb74ef793af33e1d2a03c74d6e6cff

See more details on using hashes here.

File details

Details for the file xpath_kit-0.3.11-py3-none-any.whl.

File metadata

  • Download URL: xpath_kit-0.3.11-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for xpath_kit-0.3.11-py3-none-any.whl
Algorithm Hash digest
SHA256 677fedabfdb3c3a5d18d06e5437f9498bcce2e5abdce3eeb2606570c7e127891
MD5 658c41af4d8f1c00bd8177a460dea238
BLAKE2b-256 eeabb1a803626c7e1e49c656a5e82a3ce1bc5af8ce3d8138f57d00af3db576eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page