Skip to main content

XPath filter of HTML files

Project description

xpath-filter

version tests

Filter HTML files using xpath mappings.

Installation

Install xpath-filter using pip:

pip install xpath-filter

Usage

Import the xpath_filter function from the xpath_filter module. Find below some use cases.

Filtering HTML file

>>> xpaths = {
...     'article': {
...         'xpath': '//div[@class="article"]',
...         'matches': 'all',
...         'elements': {
...             'author': './@data-author',
...             'content': './p/text()'
...         }
...     }
... }
>>> xpath_filter('index.html', xpaths)

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Filtering HTML file from a YAML xpaths definition.

File at "xpaths.yml":

article:
    xpath: //div[@class="article"]
    matches: first
    elements:
        author: './@data-author'
        content: ./p/text()

Code:

>>> xpath_filter('index.html', 'xpaths.yml')

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Simplified filtering

By definining only the xpath of an HTML element, only its first match is returned and no inner element is searched.

>>> xpath_filter('index.html', {'article': '//div[@class="article"]'})
>>> xpath_filter('index.html', {'article': '//div[@class="article"]/p/text()'})

Result

{'article': <Element div at 0x1f08369ea80>}
{'article': 'Awesome'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpath_filter-1.0.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

xpath_filter-1.0.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file xpath_filter-1.0.0.tar.gz.

File metadata

  • Download URL: xpath_filter-1.0.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for xpath_filter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 22b18abe3bdef26184a890f2179427f3d1709de22bbd21baf45368b146e8ffb1
MD5 264c768e82fd45360565365eae7b415d
BLAKE2b-256 129a53fdb6ff0c878629d058ae08e3b98fdcad12903b96eb49d885b1dcb54bd3

See more details on using hashes here.

File details

Details for the file xpath_filter-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for xpath_filter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 85564984b77476fc7242f9a21cd88338d6783e3365c8ed7bd2befe68d4897d35
MD5 08d1adc7126008368afe3ab46e1f0e78
BLAKE2b-256 30ed13a7d4c548c97db78ddb7df145b5408690ca60cb4b9609c5717736e279dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page