No project description provided

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules
Typing
- Typed

Project description

Domselect

Domselect provides high-level API to work with structure of HTML document using one of HTML processing backend. To work with HTML document you have to create so-called selector object from raw content of HTML document. That selector will be bound to the root node of HTML structure. Then you can call different methods of these selector to build other selectors bound to nested parts of HTML structure.

Selector object extracts low-level nodes from DOM constructed by HTML processing backend and wraps them into high-level selector interface. If you need, you can always access low-level node stored in selector object.

Selector Backends

Domselect library provides these selectors:

LexborSelector powered by selectolax and lexbor libraries. The type of raw node is selectolax.lexbor.LexborNode. Query language is CSS.
LxmlSelector powered by lxml library. The type of raw node is lxml.html.HtmlElement. Query language is CSS.

Selector Creating

Let's have some HTML document HTML = "<div>test</div>"

To create lexbor selector from content of HTML document:

from domselect import LexborSelector
sel = LexborSelector.from_content(HTML)

Also you can create selector from raw node:

from domselect import LexborSelector
from selectolax.lexbor import LexborHTMLParser
node = LexborHTMLParser(HTML).css_first("div")
sel = LexborSelector(node)

Same goes for lxml backend. Here is an example of creating lxml selector from raw node:

from lxml.html import fromstring
node = fromstring(HTML)
sel = LxmlSelector(node)

Node Traversal Methods

Each of these methods return other selectors of same type i.e. LexborSelector return other LexborSelectors and LxmlSelector returns other LxmlSelectors. Query language is CSS.

Method find(query: str) returns list of selectors bound to raw nodes found by CSS query.

Method first(query: str) returns None of selector bound to first raw node found by CSS query.

There is similar find_raw and first_raw methods which works in same way but returns low-level raw nodes i.e. they do not wrap found nodes into selector interface.

Method parent() returns selector bound to raw node which is parent to raw node of current selector.

Method exists(query: str) returns boolean flag indicates if any node has been found by CSS query.

Method first_contains(query: str, pattern: str[, default: None]) returns selector bound to first raw node found by CSS query and which contains text as pattern parameter. If node is not found then NodeNotFoundError is raised. You can pass default=None optional parameter to return None in case of node is not found.

Node Properties Methods

Method attr(name: str[, default: None|str]) returns content of node's attribute of given name. If node does not have such attribute the AttributeNotFoundError is raised. If you pass optional default: None|str parameter the method will return None or str if attribute does not exists.

Method text([strip: bool]) returns text content of current node and all its sub-nodes. By default returned text is stripped at beginning and ending from whitespaces, tabulations and line-breaks. You can turn off striping by passing strip=False parameter.

Method tag() returns tag name of raw node to which current selector is bound.

Traversal and Properties Methods

These methods combine two operations: search node by query and do something on found node. They are helful if you want to get text or attribute from found node by this node might not exists. Such methods allows you to return reasonable default value in case node is not found. On contrary, if you use call chain like first(..).text() Then you'll not be able to return from text() default value because first() will raise Exception if node is not found.

Method first_attr(query: str, name: str[, default: None|str]) returns content of attribute of given name of node found by given query. If node does not have such attribute the AttributeNotFoundError is raised. If node is not found by given query the NodeNotFoundError is raised. If you pass optional default: None|str parameter the method will return None or str instead of rasing exceptions.

Method first_text(query: str[, default: None|str, strip: bool]) returns text content of raw node (and all its sub-nodes) found by given query. If node is not found the NodeNotFoundError is raised. Use optional default: None|str parametere to return None or str instead of raising exceptions. You can control text stripping with strip parameter (see description of text() method).

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules
Typing
- Typed

Release history Release notifications | RSS feed

0.0.6

Aug 31, 2025

0.0.5

Aug 31, 2025

0.0.4

Aug 31, 2025

This version

0.0.3

Aug 31, 2025

0.0.2

Aug 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

domselect-0.0.3.tar.gz (10.4 kB view details)

Uploaded Aug 31, 2025 Source

File details

Details for the file domselect-0.0.3.tar.gz.

File metadata

Download URL: domselect-0.0.3.tar.gz
Upload date: Aug 31, 2025
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for domselect-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`07d268d650545e3d86fc8a59eb1c31e8749b3862982b25d3f7039fe31f0804ea`
MD5	`80140c4ab189794d7b851770d6e055fa`
BLAKE2b-256	`a5820c43f4986ccf23eb773dff77f994780eb1c0284256655b89593268b84c2b`

See more details on using hashes here.

domselect 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Domselect

Selector Backends

Selector Creating

Node Traversal Methods

Node Properties Methods

Traversal and Properties Methods

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes