Skip to main content

XPath filter of HTML files

Project description

xpath-filter

version tests

Filter HTML files using xpath mappings.

Installation

Install xpath-filter using pip:

pip install xpath-filter

Usage

Import the xpath_filter function from the xpath_filter module. Find below some use cases.

Filtering HTML file

>>> xpaths = {
...     'article': {
...         'xpath': '//div[@class="article"]',
...         'matches': 'all',
...         'elements': {
...             'author': './@data-author',
...             'content': './p/text()'
...         }
...     }
... }
>>> xpath_filter('index.html', xpaths)

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Filtering HTML file from a YAML xpaths definition.

File at "xpaths.yml":

article:
    xpath: //div[@class="article"]
    matches: all
    elements:
        author: './@data-author'
        content: ./p/text()

Code:

>>> xpath_filter('index.html', 'xpaths.yml')

Result

{'article': [{'author': 'Ana', 'Content': 'Awesome'}, {'author': 'Bob', 'Content': 'Bad'}]}

Simplified filtering

By definining only the xpath of an HTML element, only its first match is returned and no inner element is searched.

>>> xpath_filter('index.html', {'article': '//div[@class="article"]'})
>>> xpath_filter('index.html', {'article': '//div[@class="article"]/p/text()'})

Result

{'article': <Element div at 0x1f08369ea80>}
{'article': 'Awesome'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpath_filter-1.0.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

xpath_filter-1.0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file xpath_filter-1.0.1.tar.gz.

File metadata

  • Download URL: xpath_filter-1.0.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for xpath_filter-1.0.1.tar.gz
Algorithm Hash digest
SHA256 625273246a4b97980e6bfdf769b9277c64c05864b6f51ee5bca24ae2adc1b373
MD5 933c7443096965901ca15eb1ac2f7a84
BLAKE2b-256 4a7f9c62aaede6a3600b8e94e75bcbeadc560a046a6e0cc8b82c4c0e672a227a

See more details on using hashes here.

File details

Details for the file xpath_filter-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for xpath_filter-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c84c6c811675fc8ec4bb3c92a5ffe55af5400e2706841e752534742d98d20a8c
MD5 0b2516fd51dd78778be0204da0b9e74f
BLAKE2b-256 47968af617a83bf5a9e519ea8d97d892cd829da66885da262482c8d72c92db81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page