Skip to main content

A kitchen for (beautiful) soup

Project description

https://travis-ci.org/visaplan/kitchen.svg?branch=master

visaplan.kitchen

This package tackles “soup”, i.e. trees which are created by the well-known beautifulsoup4 package from parsed HTML or XML sources. It might be possible to accomplish the same by using lxml directly, but it might have been more difficult, and thus it is left to another package.

Features

  • spoons module, for tackling “soup”, e.g.

    • has_any_class (a filter function to check for one of the given classes)

  • forks module (named mainly for historical reasons; for poking around in the soup), e.g. extract_linktext, convert_dimension_styles

  • ids module, for creation of new ids for HTML elements

    • id_factory:

      new_id = id_factory(...)
      id = new_id(prefix)

Tests remark

The modules are documented and tested by doctests. However, they currently don’t fully work because of import problems; see the issue tracker.

Help is appreciated.

Examples

This add-on can be seen in action at the following sites:

Documentation

For now, the functions are documented by doctests.

Installation

Install visaplan.kitchen by adding it to your buildout:

[buildout]

...

eggs =
    visaplan.kitchen

and then running bin/buildout

Contribute

Support

If you are having issues, please let us know; please use the issue tracker mentioned above.

License

The project is licensed under the GPLv2.

To Do

  • .extract module:

    • implement head(words=N) constraint

    • Create generic wordcount facility? (after the wc program; count words, characters, and probably lines as well)

Contributors

Changelog

1.0.5 (2024-04-09)

New Features:

  • .extract.head supports the verbose option to aid processing of multiple fields; code example included.

Improvements:

  • Added a doctest for .extract.head: yes, we accept text/plain as well.

Miscellaneous:

  • .extract._head_kwargs: when injecting the fuzz default value, we ignore a words restriction now, which may be given additionally; only the chars restriction is needed.

[tobiasherp]

1.0.4 (2023-12-21)

Bugfixes:

  • .spoons.stripped_soup raises an IndexError when called with empty content.

[tobiasherp]

1.0.3 (2022-09-20)

New Features:

  • New function .spoons.generate_image_infos

[tobiasherp]

1.0.2 (2021-10-27)

Improvements:

  • Imports sorted by isort

New Features:

  • New extract module to create extracts of HTML text (e.g. a head, containing the first NN visible characters)

Requirements:

[tobiasherp]

1.0.1 (2020-02-25)

  • Python 3 compatibility (python-modernize) [tobiasherp]

1.0 (2018-09-17)

  • Initial release. [tobiasherp]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visaplan.kitchen-1.0.5.tar.gz (49.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page