Easily import static HTML websites into Plone.
Project description
Parse2Plone is an HTML parser (in the form of a Buildout recipe that creates a script for you) to easily get content from static HTML websites into Plone.
Warning
This is a Buildout recipe for use with Plone; by itself it does nothing. If you do not know what Plone is, please see: http://plone.org. If you do not know what Buildout is, please see: http://www.buildout.org/.
Getting started
Because it always drives me nuts when you have to dig for a recipe’s options, here they are:
[import] recipe = parse2plone path = /Plone illegal_chars = _ . html_extensions = html image_extensions = gif jpg jpeg png file_extensiosn = mp3 target_tags = a div h1 h2 p force = false publish = false slugify = false rename = false
The parameters listed above are configured with their default values. Edit these values if you would like to change the default behavior; they are (mostly) self-explanatory. Now you can just cut and paste to get started or keep reading if you would like to know more.
Explanation
Why did you create Parse2Plone when the following packages already exist:
Here are some reasons:
Because Parse2Plone is aimed at lowering the bar for folks who don’t already know (or want to know) what a “transmogrifier blueprint” is but are able to update their buildout.cfg file; run Buildout; then run a single command all without having to think too much.
collective.transmogrify provides a framework for creating reusable pipes (whose definitions are called blueprints). Parse2Plone provides a single, non-reusable “pipe/blueprint”.
The author had an itch to scratch; it will be nice for him to be able to say “just go write a script” and then point to an example.
Transmogrifier and friends appear to be “developer’s tools”, while the author wants Parse2Plone to be an “end user’s tool”.
If you are a developer looking to create repeatable migrations, you probably want to be using collective.transmogrifier. If you are an end user that just wants to see your static website in Plone, then you might want to give Parse2Plone a try.
There is also this user comment, which captures the author’s sentiment:
Parse2Plone's release was very timely as I need either this or something very similar - and while I've no doubt I could make transmogrify do the job, it's a lot of work for a one-shot loading of legacy pages. -Derek Broughton
Installation
You can install Parse2Plone by editing your buildout.cfg file like so. First add an import section:
[import] recipe = parse2plone
Then add the import section to the list of parts:
[buildout] ... parts = ... import
Now run bin/buildout as usual.
Execution
You can run Parse2Plone like this:
$ bin/plone run bin/import /path/to/files
Demonstration
If you have a site in /var/www/html that contains the following:
/var/www/html/index.html /var/www/html/about/index.html
You should run:
$ bin/plone run bin/import /var/www/html
And the following will be created:
Modification
Modifying the default behavior of parse2plone is easy; just use the command line options or add parameters to your buildout.cfg file. Both approaches allow customization of the same set of options, but the command line arguments will trump any settings found in your buildout.cfg file.
Buildout options
You can configure the following parameters in your buildout.cfg file in the parse2plone recipe section.
Options
Parameter |
Default value |
Description |
path |
/Plone |
Specify an alternate location in the database for the import to occur. |
illegal_chars |
_ . |
Specify illegal characters. parse2plone will ignore files that contain these characters. |
html_extensions |
html |
Specify HTML file extensions. parse2plone will import HTML files with these extensions |
image_extensions |
png, gif, jpg, jpeg, |
Specify image file extensions. parse2plone will import image files with these extensions. |
file_extensions |
mp3 |
Specify image file extensions. parse2plone will import files with with these extensions. |
target_tags |
a h1 h2 p |
Specify target tags. parse2plone will parse the contents of HTML tags listed. |
force |
false |
Force create folders that do not exist. |
publish |
false |
Publish newly created content. |
slugify |
false |
“Slugify” content. (see slugify.py) |
rename |
false |
Rename content. (see rename.py) |
Example
Instead of accepting the default parse2plone behaviour, in your buildout.cfg file you may specify the following:
[import] recipe = parse2plone path = /Plone/foo html_extensions = htm image_extensions = png target_tags = p
This will configure parse2plone to (only) import content from:
Images ending in .png
HTML files ending in .htm
Text within p tags
to:
A folder named /Plone/foo.
Command line options
The following parse2plone command line options are supported.
Options
'--path', '-p'
You can specify an alternate import path (‘/Plone’ by default) with --path or -p:
$ bin/plone run bin/import /path/to/files --path=/Plone/foo
'--html-extensions'
You can specify HTML file extensions with the --html-extensions option:
$ bin/plone run bin/import /path/to/files --html-extensions=htm
'--image-extensions'
You can specify image file extensions with the --image-extensions option:
$ bin/plone run bin/import /path/to/files --image-extensions=png
'--file-extensions'
You can specify generic file extensions with the --file-extensions option:
$ bin/plone run bin/import /path/to/files --file-extensions=pdf
'--force'
Force create folders that do not exist.
'--publish'
Publish newly created content.
'--slugify'
“Slugify” content (see slugify.py).
'--rename'
Rename content (see rename.py).
'--help'
You can ask parse2plone to tell you about its available options with the --help option:
$ bin/plone run bin/import -h Usage: import [options] Options: -h, --help show this help message and exit -p PATH, --path=PATH Path to Plone site object or sub-folder --html-extensions=HTML_EXTENSIONS Specify HTML file extensions --illegal-chars=ILLEGAL_CHARS Specify characters to ignore --image-extensions=IMAGE_EXTENSIONS Specify image file extensions --file-extensions=FILE_EXTENSIONS Specify generic file extensions --target-tags=TARGET_TAGS Specify HTML tags to parse --force Force creation of folders --publish Optionally publish newly created content --slugify Optionally "slugify" content (see slugify.py) --rename=RENAME Optionally rename content (see rename.py)
Example
Instead of accepting the default parse2plone behaviour, on the command line you may specify the following:
$ bin/plone run bin/import /path/to/files -p /Plone/foo --html-extensions=html \ --image-extensions=png --target-tags=p
This will configure parse2plone to (only) import content from:
Images ending in .png
HTML files ending in .htm
Text within p tags
to:
A Plone site folder named /Plone/foo.
Consternation
Here are some trouble-shooting comments/tips.
Compiling lxml
Parse2Plone requires lxml which in turn requires libxml2 and libxslt. If you do not have lxml installed “globally” (i.e. in your system Python’s site-packages directory) then Buildout will try to install it for you. At this point lxml will look for the libxml2/libxslt2 development libraries to build against, and if you don’t have them installed on your system already your mileage may vary (i.e. Buildout will fail).
Database access
Before running parse2plone, you must either stop your Plone site or use ZEO. Otherwise parse2plone will not be able to access the database.
Communication
Questions, comments, or concerns? Please e-mail: aclark@aclark.net.
History
0.9.2 (11/03/2010)
More doc fixes
0.9.1 (11/03/2010)
Doc fixes
0.9.0 (11/03/2010)
Fix regressions introduced (or unresolved as of) 0.8.2. Thanks Derek Broughton for the bug report(s)
Many fixes to convert_parameter_values() method which converts recipe parameters to arguments passed to main()
Fix slugify feature
0.8.2 (11/02/2010)
Add rename feature
Fix regressions introduced in 0.8.1
0.8.1 (10/29/2010)
Refactor options/parameters functionality to universally support _SETTINGS dict
Add “slugify” feature
Doc fixes
Add support to optionally publish content after creation
Add support for generic file import
0.8 (10/27/2010)
Support the importing of content to folders within the Plone site object
0.7 (10/25/2010)
Documentation fixes
0.6 (10/25/2010)
Support customization via recipe parameters and command line arguments
0.5 (10/22/2010)
Revert ‘Add Plone to install_requires’
0.4 (10/22/2010)
Add ‘Plone’ to install_requires
0.3 (10/22/2010)
Another setuptools fix
0.2 (10/22/2010)
Setuptools fix
0.1 (10/21/2010)
Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.