Enhanced version of parsel, extracting data from HTML and XML using complex rules
Project description
Enhanced version of parsel, extracting data from HTML and XML using complex rules.
Features
Magic g method: extract items by complex rules
Apply filters to a value
x instance: many helper methods and filters
Plus all the standard features of parsel
>>> from parselx import SelectorX
>>> sel = SelectorX("""<html>
<body>
<h1>Hello, Parselx!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>""")
>>>
>>> sel.g('h1')
'Hello, Parselx!'
>>> sel.g('h1 | reverse')
'!xlesraP ,olleH'
>>> sel.g('[ul li a]')
['Link 1', 'Link 2']
>>> sel.g({'title':['h1', lambda s: s.upper()], 'links':'[a @href]'})
{'title': 'HELLO, PARSELX!', 'links': ['http://example.com', 'http://scrapy.org']}
>>> sel.g('[ul li a @href| map:slice,7,-4]')
['example', 'scrapy']
Installation
$ pip install parselx
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parselx-0.0.4.tar.gz
(5.1 kB
view details)
File details
Details for the file parselx-0.0.4.tar.gz
.
File metadata
- Download URL: parselx-0.0.4.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5082b8e8b95150bf0b6d00784e3c4fc6525c4df8f46dc5caddf684ea0dc37b70 |
|
MD5 | 77387781fdebd7e5aa1a3abbb61c1544 |
|
BLAKE2b-256 | bd0a6dd4870d8677361800cfba67dddc07f1a57c78ae7d1cb4d3f5f29f720ed2 |