This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
Project description
Introduction
- transmogrify.htmlcontentextractor
This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
Changelog
1.0b4 (2011-02-06)
handle ‘/text()’ in xpaths
new ‘optionaltext’ rule format
1.0b3 (2010-12-13)
simpler autogenerated xpath
better logging
1.0b2 (2010-11-09)
Put condition on autofinder so can be turned off
1.0b1 (2010-11-03)
ignore already found items. better debug [“Dylan Jay”]
skip templates if item already parsed [“Dylan Jay”]
print automaticly found XPaths [“Dylan Jay”]
make text fields strip tail text [“Vitaliy Podoba”]
1.0dev (2010-03-22)
split the auto templatefinder out to it’s own blueprint [“Dylan Jay”]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for transmogrify.htmlcontentextractor-1.0b4.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39c6e1d90f10d5cdeee56ca026a6eb9693f1ad612d3f649aab4a88147945a6ae |
|
MD5 | ebc7034460f778191f04138b475a7c33 |
|
BLAKE2b-256 | c3eed0c1e7bd62bc84b3b86d54ea2e4743094e87d40201cd1a6c9f2c12c2c1a1 |