Enhanced version of parsel, extracting data from HTML and XML using complex rules
Project description
Enhanced version of parsel, extracting data from HTML and XML using complex rules.
Features
Magic g method: extract items by complex rules
Apply filters to a value
x instance: many helper methods and filters
Plus all the standard features of parsel
>>> from parselx import SelectorX
>>> sel = SelectorX("""<html>
<body>
<h1>Hello, Parselx!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>""")
>>>
>>> sel.g('h1')
'Hello, Parselx!'
>>> sel.g('h1 | reverse')
'!xlesraP ,olleH'
>>> sel.g('[ul li a]')
['Link 1', 'Link 2']
>>> sel.g({'title':['h1', lambda s: s.upper()], 'links':'[a @href]'})
{'title': 'HELLO, PARSELX!', 'links': ['http://example.com', 'http://scrapy.org']}
>>> sel.g('[ul li a @href| map:slice,7,-4]')
['example', 'scrapy']
Installation
$ pip install parselx
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parselx-0.0.4.tar.gz
(5.1 kB
view hashes)