XPath filter of HTML files
Project description
xpath-filter
Filter HTML files using xpath mappings.
Installation
Install xpath-filter
using pip:
pip install xpath-filter
Usage
Import the xpath_filter
function from the xpath_filter
module. Find below
some use cases.
Filtering HTML file
>>> xpaths = {
... 'article': {
... 'xpath': '//div[@class="article"]',
... 'matches': 'all',
... 'elements': {
... 'author': './@data-author',
... 'content': './p/text()'
... }
... }
... }
>>> xpath_filter('index.html', xpaths)
Result
{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}
Filtering HTML file from a YAML xpaths definition.
File at "xpaths.yml":
article:
xpath: //div[@class="article"]
matches: all
elements:
author: './@data-author'
content: ./p/text()
Code:
>>> xpath_filter('index.html', 'xpaths.yml')
Result
{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}
Simplified filtering
By definining only the xpath of an HTML element, only its first match is returned and no inner element is searched.
>>> xpath_filter('index.html', {'article': '//div[@class="article"]'})
>>> xpath_filter('index.html', {'article': '//div[@class="article"]/p/text()'})
Result
{'article': <Element div at 0x1f08369ea80>}
{'article': 'Awesome'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xpath_filter-1.0.1.tar.gz
(7.5 kB
view details)
Built Distribution
File details
Details for the file xpath_filter-1.0.1.tar.gz
.
File metadata
- Download URL: xpath_filter-1.0.1.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 625273246a4b97980e6bfdf769b9277c64c05864b6f51ee5bca24ae2adc1b373 |
|
MD5 | 933c7443096965901ca15eb1ac2f7a84 |
|
BLAKE2b-256 | 4a7f9c62aaede6a3600b8e94e75bcbeadc560a046a6e0cc8b82c4c0e672a227a |
File details
Details for the file xpath_filter-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: xpath_filter-1.0.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c84c6c811675fc8ec4bb3c92a5ffe55af5400e2706841e752534742d98d20a8c |
|
MD5 | 0b2516fd51dd78778be0204da0b9e74f |
|
BLAKE2b-256 | 47968af617a83bf5a9e519ea8d97d892cd829da66885da262482c8d72c92db81 |