Page Parser Utils For scraping, List index update
Project description
Parse Utilities (ParseUtils)
This is a package helps you extracting python dict from html/xml contents
Installation
pip install parse-utils
Usage
from parse_utils.page_parser import PageParser
html_data = '''
<html>
<head><title>This is title</title></head>
<body>
<p id="header">This is header id</p>
<p class="content">This is content</p>
</body>
</html>
'''
config = {
'header': ['//p[@id="header"]/text()'],
'content': ['//p[@class="content"]'],
}
pparser = PageParser(html_data)
item = pparser.extract_dict(config)
print(item)
Output will be:
{'header': 'This is header id', 'content': 'This is content'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parse-utils-0.5.tar.gz
(2.0 kB
view details)
File details
Details for the file parse-utils-0.5.tar.gz.
File metadata
- Download URL: parse-utils-0.5.tar.gz
- Upload date:
- Size: 2.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c4daa0d1b7617d15293a2d96094e4ed361b4e3850aaafa7bfa8f84de448f218
|
|
| MD5 |
1afc55c44d46f1c6aa23f368e7d2588b
|
|
| BLAKE2b-256 |
45d0b939c6c57abc2838ce446f75e6d315cbad84893b50026536e0c7d9e5bd4f
|