A kitchen for (beautiful) soup
Project description
visaplan.kitchen
This package tackles “soup”, i.e. trees which are created by the well-known beautifulsoup4 package from parsed HTML or XML sources. It might be possible to accomplish the same by using lxml directly, but it might have been more difficult, and thus it is left to another package.
Features
spoons module, for tackling “soup”, e.g.
has_any_class (a filter function to check for one of the given classes)
forks module (named mainly for historical reasons; for poking around in the soup), e.g. extract_linktext, convert_dimension_styles
ids module, for creation of new ids for HTML elements
id_factory:
new_id = id_factory(...) id = new_id(prefix)
Tests remark
The modules are documented and tested by doctests. However, they currently don’t fully work because of import problems; see the issue tracker.
Help is appreciated.
Examples
This add-on can be seen in action at the following sites:
Documentation
For now, the functions are documented by doctests.
Installation
Install visaplan.kitchen by adding it to your buildout:
[buildout] ... eggs = visaplan.kitchen
and then running bin/buildout
Contribute
Issue Tracker: https://github.com/visaplan/kitchen/issues
Source Code: https://github.com/visaplan/kitchen
Support
If you are having issues, please let us know; please use the issue tracker mentioned above.
License
The project is licensed under the GPLv2.
To Do
.extract module:
implement head(words=N) constraint
Create generic wordcount facility? (after the wc program; count words, characters, and probably lines as well)
Contributors
Tobias Herp, tobias.herp@visaplan.com
Changelog
1.0.5 (2024-04-09)
New Features:
.extract.head supports the verbose option to aid processing of multiple fields; code example included.
Improvements:
Added a doctest for .extract.head: yes, we accept text/plain as well.
Miscellaneous:
.extract._head_kwargs: when injecting the fuzz default value, we ignore a words restriction now, which may be given additionally; only the chars restriction is needed.
[tobiasherp]
1.0.4 (2023-12-21)
Bugfixes:
.spoons.stripped_soup raises an IndexError when called with empty content.
[tobiasherp]
1.0.3 (2022-09-20)
New Features:
New function .spoons.generate_image_infos
[tobiasherp]
1.0.2 (2021-10-27)
Improvements:
Imports sorted by isort
New Features:
New extract module to create extracts of HTML text (e.g. a head, containing the first NN visible characters)
Requirements:
lxml v3.7.0+ (collect_ids argument)
six explicitly required
visaplan.tools v1.3.7+
[tobiasherp]
1.0.1 (2020-02-25)
Python 3 compatibility (python-modernize) [tobiasherp]
1.0 (2018-09-17)
Initial release. [tobiasherp]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file visaplan.kitchen-1.0.5.tar.gz
.
File metadata
- Download URL: visaplan.kitchen-1.0.5.tar.gz
- Upload date:
- Size: 49.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b9ecd44068520042b531dd11dde18098c56cd094fe029d0fe281cc50634588c |
|
MD5 | f2c3fdb2ba59661b11453651439795c4 |
|
BLAKE2b-256 | 63010c3459aad1ab3462a53157089f78360dd14e9319468fdf858aba839e825a |