XML/HTML scraper using XPath queries.
Copyright (C) 2014-2018 H. Turgut Uyar <email@example.com>
Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.
Piculet has been tested with Python 2.7, Python 3.4+, PyPy2 5.7+, and PyPy3 5.7+. You can install the latest version using pip:
pip install piculet
- Dropped support for Python 3.3.
- Fixes for handling Unicode data in HTML for Python 2.
- Added registry for preprocessors.
- Support for writing specifications in YAML.
- Added a class-based API for writing specifications.
- Added predefined transformation functions.
- Removed callables from specification maps. Use the new API instead.
- Added support for registering new reducers and transformers.
- Added support for defining sections in document.
- Refactored XPath evaluation method in order to parse path expressions once.
- Preprocessing will be done only once when the tree is built.
- Concatenation is now the default reducing operation.
- Added “–version” option to command line arguments.
- Added option to force the use of lxml’s HTML builder.
- Fixed the error where non-truthy values would be excluded from the result.
- Added support for transforming node text during preprocess.
- Added separate preprocessing function to API.
- Renamed the “join” reducer as “concat”.
- Renamed the “foreach” keyword for keys as “section”.
- Removed some low level debug messages to substantially increase speed.
- Removed the caching feature.
- Added helper function for getting cache hash keys of URLs.
- Added optional value transformations.
- Added support for custom reducer callables.
- Added command-line option for scraping documents from local files.
- Added support for Python 2.7.
- Fixed lxml support.
- First release on PyPI.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|piculet-1.0b7-py2.py3-none-any.whl (13.9 kB) Copy SHA256 hash SHA256||Wheel||py2.py3||Mar 21, 2018|
|piculet-1.0b7.tar.gz (32.8 kB) Copy SHA256 hash SHA256||Source||None||Mar 21, 2018|