Enhanced version of parsel, extracting data from HTML and XML using complex rules
Project description
Enhanced version of parsel, extracting data from HTML and XML using complex rules.
Features
Magic g method, extract items by complex rules
Apply filters to a value, appended in rule
x instance: many helper methods and filters
Plus all the standard features of parsel
>>> from parselx import SelectorX
>>> sel = SelectorX("""<html>
<body>
<h1>Hello, Parselx!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>""")
>>>
>>> sel.g('h1::text')
'Hello, Parselx!'
>>> sel.g('[ul li a::text]')
['Link 1', 'Link 2']
>>> sel.g({'title':['h1::text', lambda s: s.upper()], 'links':'[a::attr(href)]'})
{'title': 'HELLO, PARSELX!', 'links': ['http://example.com', 'http://scrapy.org']}
Installation
To install, simply use pipenv (or pip):
$ pipenv install parselx
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parselx-0.0.2.tar.gz
(4.8 kB
view hashes)