Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Project description
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Free software: BSD license
Documentation: https://parsel.readthedocs.org.
Features
Extract text using CSS or XPath selectors
Regular expression helper methods
Example:
>>> from parsel import Selector >>> sel = Selector(text=u"""<html> <body> <h1>Hello, Parsel!</h1> <ul> <li><a href="http://example.com">Link 1</a></li> <li><a href="http://scrapy.org">Link 2</a></li> </ul </body> </html>""") >>> >>> sel.css('h1::text').extract_first() u'Hello, Parsel!' >>> >>> sel.css('h1::text').re('\w+') [u'Hello', u'Parsel'] >>> >>> for e in sel.css('ul > li'): print(e.xpath('.//a/@href')).extract_first() http://example.com http://scrapy.org
History
0.9.4 (2015-08-10)
Try workaround for travis-ci/dpl#253
0.9.3 (2015-08-07)
Add base_url argument
0.9.2 (2015-08-07)
Rename module unified -> selector and promoted root attribute
Add create_root_node function
0.9.1 (2015-08-04)
Setup Sphinx build and docs structure
Build universal wheels
Rename some leftovers from package extraction
0.9.0 (2015-07-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parsel-0.9.4.tar.gz
(26.8 kB
view hashes)
Built Distribution
Close
Hashes for parsel-0.9.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfadf751d3a34c1f29dcfd077d55cd89389e4017c655217421712d06d85b16dc |
|
MD5 | 3c022ee8e1db7daba9cc4bb65b668ecc |
|
BLAKE2b-256 | a8395b125374fba3c36d1bc88618c5f65efc2451cef20c1f57903de38a88c38f |