Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Project description
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Free software: BSD license
Documentation: https://parsel.readthedocs.org.
Features
Extract text using CSS or XPath selectors
Regular expression helper methods
Example:
>>> from parsel import Selector >>> sel = Selector(text=u"""<html> <body> <h1>Hello, Parsel!</h1> <ul> <li><a href="http://example.com">Link 1</a></li> <li><a href="http://scrapy.org">Link 2</a></li> </ul </body> </html>""") >>> >>> sel.css('h1::text').extract_first() u'Hello, Parsel!' >>> >>> sel.css('h1::text').re('\w+') [u'Hello', u'Parsel'] >>> >>> for e in sel.css('ul > li'): print(e.xpath('.//a/@href').extract_first()) http://example.com http://scrapy.org
History
1.4.0 (2018-02-08)
Selector and SelectorList can’t be pickled because pickling/unpickling doesn’t work for lxml.html.HtmlElement; parsel now raises TypeError explicitly instead of allowing pickle to silently produce wrong output. This is technically backwards-incompatible if you’re using Python < 3.6.
1.3.1 (2017-12-28)
Fix artifact uploads to pypi.
1.3.0 (2017-12-28)
has-class XPath extension function;
parsel.xpathfuncs.set_xpathfunc is a simplified way to register XPath extensions;
Selector.remove_namespaces now removes namespace declarations;
Python 3.3 support is dropped;
make htmlview command for easier Parsel docs development.
CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
1.2.0 (2017-05-17)
Add SelectorList.get and SelectorList.getall methods as aliases for SelectorList.extract_first and SelectorList.extract respectively
Add default value parameter to SelectorList.re_first method
Add Selector.re_first method
Add replace_entities argument on .re() and .re_first() to turn off replacing of character entity references
Bug fix: detect None result from lxml parsing and fallback with an empty document
Rearrange XML/HTML examples in the selectors usage docs
Travis CI:
Test against Python 3.6
Test against PyPy using “Portable PyPy for Linux” distribution
1.1.0 (2016-11-22)
Change default HTML parser to lxml.html.HTMLParser, which makes easier to use some HTML specific features
Add css2xpath function to translate CSS to XPath
Add support for ad-hoc namespaces declarations
Add support for XPath variables
Documentation improvements and updates
1.0.3 (2016-07-29)
Add BSD-3-Clause license file
Re-enable PyPy tests
Integrate py.test runs with setuptools (needed for Debian packaging)
Changelog is now called NEWS
1.0.2 (2016-04-26)
Fix bug in exception handling causing original traceback to be lost
Added docstrings and other doc fixes
1.0.1 (2015-08-24)
Updated PyPI classifiers
Added docstrings for csstranslator module and other doc fixes
1.0.0 (2015-08-22)
Documentation fixes
0.9.6 (2015-08-14)
Updated documentation
Extended test coverage
0.9.5 (2015-08-11)
Support for extending SelectorList
0.9.4 (2015-08-10)
Try workaround for travis-ci/dpl#253
0.9.3 (2015-08-07)
Add base_url argument
0.9.2 (2015-08-07)
Rename module unified -> selector and promoted root attribute
Add create_root_node function
0.9.1 (2015-08-04)
Setup Sphinx build and docs structure
Build universal wheels
Rename some leftovers from package extraction
0.9.0 (2015-07-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for parsel-1.4.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a9ac0c1db8175547e1732be57ced2a2dc0714590f6b249d022ad25d918ef923 |
|
MD5 | ff99af7fbf3b71311de5c5a480ad8f12 |
|
BLAKE2b-256 | bcb42fd37d6f6a7e35cbc4c2613a789221ef1109708d5d4fb9fd5f6f721a43c9 |