XPath filter of HTML files
Project description
xpath-filter
Filter HTML files using xpath mappings.
Installation
Install xpath-filter
using pip:
pip install xpath-filter
Usage
Import the xpath_filter
function from the xpath_filter
module. Find below
some use cases.
Filtering HTML file
>>> xpaths = {
... 'article': {
... 'xpath': '//div[@class="article"]',
... 'matches': 'all',
... 'elements': {
... 'author': './@data-author',
... 'content': './p/text()'
... }
... }
... }
>>> xpath_filter('index.html', xpaths)
Result
{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}
Filtering HTML file from a YAML xpaths definition.
File at "xpaths.yml":
article:
xpath: //div[@class="article"]
matches: first
elements:
author: './@data-author'
content: ./p/text()
Code:
>>> xpath_filter('index.html', 'xpaths.yml')
Result
{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}
Simplified filtering
By definining only the xpath of an HTML element, only its first match is returned and no inner element is searched.
>>> xpath_filter('index.html', {'article': '//div[@class="article"]'})
>>> xpath_filter('index.html', {'article': '//div[@class="article"]/p/text()'})
Result
{'article': <Element div at 0x1f08369ea80>}
{'article': 'Awesome'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xpath_filter-1.0.0.tar.gz
(7.5 kB
view details)
Built Distribution
File details
Details for the file xpath_filter-1.0.0.tar.gz
.
File metadata
- Download URL: xpath_filter-1.0.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22b18abe3bdef26184a890f2179427f3d1709de22bbd21baf45368b146e8ffb1 |
|
MD5 | 264c768e82fd45360565365eae7b415d |
|
BLAKE2b-256 | 129a53fdb6ff0c878629d058ae08e3b98fdcad12903b96eb49d885b1dcb54bd3 |
File details
Details for the file xpath_filter-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: xpath_filter-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85564984b77476fc7242f9a21cd88338d6783e3365c8ed7bd2befe68d4897d35 |
|
MD5 | 08d1adc7126008368afe3ab46e1f0e78 |
|
BLAKE2b-256 | 30ed13a7d4c548c97db78ddb7df145b5408690ca60cb4b9609c5717736e279dc |