Skip to main content
Help us improve Python packaging – donate today!

XML/HTML scraper using XPath queries.

Project Description

Copyright (C) 2014-2018 H. Turgut Uyar <uyar@tekir.org>

Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.

PyPI:https://pypi.python.org/pypi/piculet/
Repository:https://bitbucket.org/uyar/piculet
Documentation:https://piculet.readthedocs.io/

Piculet has been tested with Python 2.7, Python 3.4+, PyPy2 5.7+, and PyPy3 5.7+. You can install the latest version using pip:

pip install piculet

History

1.0b7 (2018-03-21)

  • Dropped support for Python 3.3.
  • Fixes for handling Unicode data in HTML for Python 2.
  • Added registry for preprocessors.

1.0b6 (2018-01-17)

  • Support for writing specifications in YAML.

1.0b5 (2018-01-16)

  • Added a class-based API for writing specifications.
  • Added predefined transformation functions.
  • Removed callables from specification maps. Use the new API instead.
  • Added support for registering new reducers and transformers.
  • Added support for defining sections in document.
  • Refactored XPath evaluation method in order to parse path expressions once.
  • Preprocessing will be done only once when the tree is built.
  • Concatenation is now the default reducing operation.

1.0b4 (2018-01-02)

  • Added “–version” option to command line arguments.
  • Added option to force the use of lxml’s HTML builder.
  • Fixed the error where non-truthy values would be excluded from the result.
  • Added support for transforming node text during preprocess.
  • Added separate preprocessing function to API.
  • Renamed the “join” reducer as “concat”.
  • Renamed the “foreach” keyword for keys as “section”.
  • Removed some low level debug messages to substantially increase speed.

1.0b3 (2017-07-25)

  • Removed the caching feature.

1.0b2 (2017-06-16)

  • Added helper function for getting cache hash keys of URLs.

1.0b1 (2017-04-26)

  • Added optional value transformations.
  • Added support for custom reducer callables.
  • Added command-line option for scraping documents from local files.

1.0a2 (2017-04-04)

  • Added support for Python 2.7.
  • Fixed lxml support.

1.0a1 (2016-08-24)

  • First release on PyPI.

Release history Release notifications

This version
History Node

1.0b7

History Node

1.0b6

History Node

1.0b5

History Node

1.0b4

History Node

1.0b3

History Node

1.0b2

History Node

1.0b1

History Node

1.0a2

History Node

1.0a1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
piculet-1.0b7-py2.py3-none-any.whl (13.9 kB) Copy SHA256 hash SHA256 Wheel py2.py3 Mar 21, 2018
piculet-1.0b7.tar.gz (32.8 kB) Copy SHA256 hash SHA256 Source None Mar 21, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page