Clean up the HTML formatting problems introduced by pasting content from MSWord into Plone's RichText fields.
This product cleans up the HTML formatting problems that are introduced by pasting content from MSWord into Plone’s RichText fields.
Every time an object is created or edited, the HTML in its RichText fields will be sanitized.
The HTML sanitizing feature is turned on by default for all Archetype objects, but can be turned off on a per object basis by checking a box in the ‘settings’ fieldset of the default edit view.
This product provides an event subscriber for all BaseContent Archetypes objects that will clean up the HTML of all the RichText fields for each object.
The cleaning and sanitizing of the HTML code is mainly done by using the lxml library: http://codespeak.net/lxml/lxmlhtml.html by means of the htmllaundry package, written by Wichert Akkerman.
This Product does not have to be installed via quick_installer or the plone control panel.
Just add it to your buildout or install via easy_install.
- Use Wichert Akkerman’s htmllaundry utilities for code sanitizing. (jcbrand)
- Only create a new version if at_edit would not create one anyway (thomasw)
- Don’t force target=”_blank” on links. (thomasw)
- Set add_nofollow to False, since it seems to confuse lxmk.html parser (thomasw).
- Added LinguaPlone’s generateMethods magic, so that the languageIndependent field gets propagated to all translations when the canonical is edited (thomasw)
- don’t strip ‘h1’ and ‘h2’ (jcbrand)
- added a more flexible detection mechanism - encoding (pilz)
- I was too stupid to make a successful release, here we go again (thomasw)
- Bugfix in event-handler: don’t fail if cleanWordPastedText field isn’t present (thomasw)
1.1 - 2009-10-27
- Add a new sanitize method and new helper methods. Thanks to Wichert.
- Enable the cleaner by default. (jcbrand)