Skip to main content

XPath filter of HTML files

Project description

xpath-filter

version tests

Filter HTML files using xpath mappings.

Installation

Install xpath-filter using pip:

pip install xpath-filter

Usage

Import the xpath_filter function from the xpath_filter module. Find below some use cases.

Filtering HTML file

>>> xpaths = {
...     'article': {
...         'xpath': '//div[@class="article"]',
...         'matches': 'all',
...         'elements': {
...             'author': './@data-author',
...             'content': './p/text()'
...         }
...     }
... }
>>> xpath_filter('index.html', xpaths)

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Filtering HTML file from a YAML xpaths definition.

File at "xpaths.yml":

article:
    xpath: //div[@class="article"]
    matches: all
    elements:
        author: './@data-author'
        content: ./p/text()

Code:

>>> xpath_filter('index.html', 'xpaths.yml')

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Simplified filtering

By definining only the xpath of an HTML element, only its first match is returned and no inner element is searched.

>>> xpath_filter('index.html', {'article': '//div[@class="article"]'})
>>> xpath_filter('index.html', {'article': '//div[@class="article"]/p/text()'})

Result

{'article': <Element div at 0x1f08369ea80>}
{'article': 'Awesome'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpath_filter-1.0.1.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

xpath_filter-1.0.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page