Migrate old Plone or Google Sites content to current Plone 4
ILRT Content Migrator
Ed Crewe, ILRT at University of Bristol, October 2014
The core functionality should allow migration between any Plone versions.
The XML format should allow migration to most other CMS, assuming any modern CMS should cope with XML content import, one way or another.
There is a tool to migrate plone exported sites to Google sites or vice versa, see http://sites.google.com Either moving legacy plone sites to free hosting, or increasing the customisation level of a site, beyond Google sites remit. Either way I hope this tool may prove mutally beneficial.
TODO: Add import/export handling of XML format to the CMIS schema for more generic standard compliance. Plus maybe Wordpress export format, etc.
See http://bitbucket.org/edcrewe/ilrt.contentmigrator for mercurial source repository, issue tracker etc.
NOTE: Export should work (to some extent) for all plone versions.
Updated from plone 3.* version 0.6 to plone 4.* compatible version 1.6 There were no functional changes between 0.6 and 1.6, in terms of the plone export / import just extra version compatibility tweaks and update of the test suite.
Versions 1.7 - 1.13 Further feature tweaks and XML compatibility changes.
Tested with Plone from 2.0 to 4.3.3
This egg and the companion Product it contains was written to migrate content from pre-Archetypes plone 2.0 sites (or later) to current plone.
The ilrt.contentmigrator egg extends the generic setup content import system to handle binary files and custom content. Hence a fully populated site can be generated from file system content held in a profiles structure folder.
The egg follows the paradigm of the existing generic setup, but adds workflow state to the properties metadata. It also adds ..ini files for each binary content item so that these can have all their associated metadata imported and exported.
It contains a companion old-style plone product. This can be dropped into the Products folder in an old plone site. The site gains a portal_exportcontent tool. Running the export from this tool exports the content to a structure folder in the var directory ready for using to populate a current plone site, and hence migrate the content.
The code was arrived at due to the need to migrate a large number of obselete plone sites and having researched the issue, found that most tools assumed a plone version within the last few years, where Archetypes, Five, Marshall and XML, or in place content migration is viable.
Instead the code applies the methodology discussed in Andreas Jungs’ blog posting Plone migration fails - doing content-migration only
Using the Content Migrator
Copy ilrt/contentmigrator/ContentMigrator to the Products directory of the old plone site. Restart and you should have a ‘Content Migrator Tool’ listed in the right hand content drop down. Pick this and add it to the portal.
There will be a new portal_exportcontent tool in your site. Select this and choose the Export content tab. Click export and wait whilst you site becomes files in var/zope/structure If you only wish to export a subsection of your site then specify the path in the textbox at the top of the page.
Go to your new plone install. Add ilrt.contentmigrator to your buildout config eggs and zcml sections then run bin/buildout.
To do a full import you must first install ilrt.contentmigrator via the quick installer.
Add a plone site if you are not importing to an existing one. Go to the ZMI via http://host/Plone/manage and click on the portal_quickinstaller tool Select Content Migrator Tool Install check the box and click Install. You should then have /Plone/portal_setupcontent available. Click on that to access the migrator interface.
Copy (or symlink) the exported structure folder to a profile folder either in the ilrt.contentmigrator egg or in the main theme egg for your new plone site and restart, eg. ilrt.contentmigrator/ilrt/contentmigrator/profiles/import/structure
Export Formats - CSV, XML, HTML
Note that although there are three export formats only the default CSV format works for import into a newer Plone site.
(CSV format may perhaps better be called YAML these days, although maybe not strictly compliant to the YAML 1.2 specification)
The XML export format was added as a more universal format for use by migration tools into other CMS. The Addition of the HTML dump was for archival purposes.
When the contentmigrator tool is installed, the content adapter for generic setup will be modified so that the content import step will now add all content and set workflow states.
Hence generic setup, ie Plone/portal_setup is required as the base for this tool.
When you go to the new portal_setupcontent tool you can run a further enhanced version of the generic setup content step that also sets up users, groups and memberdata and provides fuller logging to screen. In addition the tool provides access to the exporter so that you can re-export a site or a subfolder of its content.
If you wish to specify another path for the structure folder import just adjust the directory in the profile that you are using e.g. directory=”c:\import” in profiles.zcml
If a default profile is used then generic setup will automatically create the content when the egg with the profile is reinstalled or selected for Plone site creation. Where as if another profile is used (such as /import above) then it has to be manually selected first and then run via the setup tool or this migrator tool. For large content imports this is likely to be preferable.
Standard generic setup runs the adapter in CMFCore.exportimport.content which will only populate content for HTML documents, and no properties or workflow states will be added.
The ilrt.contentmigrator modifies the generic setup site creation to do the following
- Populate binary content formats and archetypes if matching ones are found.
- Use Marshall’s RFC822 marshaller to extract and apply the properties data.
- Apply workflow state transitions. NB: The workflow migration requires the ilrt.migrationtool egg.
- Translate old content types and add memberdata (see below)
Please note that the import takes much longer than the export. So for example a Gigabyte of content might only take 5 minutes to export, but take an hour to import!
Content Types Translation
There is a mapping of old types to newer archetypes for old plone sites. Currently this just handles ‘Calendar Item’ to ‘Event’ and ‘Link’ to ‘ATLink’. It is in the ilrt/contentmigrator/ContentMigrator/config.py file. By modifying the TYPEMAP and NONATPROPS dictionaries of configuration data you can map other old custom types to new content, or even use it to migrate content from one new type to another.
The contentmigrator will also export and import zope held users, including passwords. It does so by generating the user, roles and groups data from GRUF or PAS based sites as generic setup xml files in the /structure/acl_users folder. Memberdata is saved as a csv file for each member in the portal_memberdata directory within acl_users.
Google sites import
The Google sites import is outside of plone. It just requires command line python with Googles gdata library installed. Please see the ilrt.contentmigrator/ilrt/contentmigrator/google/README.txt NB: Currently it doesnt handle custom content types
There is an option to export content as XML for importing into other systems, for reimporting into current Plone, stick with the default CSV format.
Google Data API import / export
Ed Crewe, Feb 2011
Add utility import and export scripts for migrating plone sites to or from Google sites. Or migrating folders of content to or from Google docs.
Initial use case was to archive simple old plone sites which had no money for hosting anymore - so even a static dump would be more costly than exporting to a free Google site, which also still provides a working CMS.
Future use cases may be migrating simple Google sites to Plone when they needed to grow substantially wrt. customisation and features.
Requires the install of gdata from http://code.google.com/p/gdata-python-client/ into 2.2 or later python. Otherwise it is entirely independent of the plone installation since it just requires or creates the structure dump folder. So the google folder code is command-line only at this time.
Copy demo_config.py to google_config.py and edit it to point at your google account and site.
Plone to Google sites export
This first version just handles google site folders and pages.
NB: Google sites have two other content types, announcements (ie simple news / events) and lists (small data customizable content types).
- Install ilrt.contentmigrator (or ContentMigrator if plone 2) in your plone instance and run the export to a structure directory on the file system.
- Set up a google site and add your google credentials and the site details to the google_config.py file along with the path to the exported structure folder then run …
> python export_to_google.py
Google sites to Plone import
This first version just handles google site folders and pages.
- For the google site you wish to migrate to plone add the google credentials and the site details to the google_config.py file along with the path to where you want the structure folder created as IMPORT_FILES then run …
> python import_from_google.py
- Install ilrt.contentmigrator in your plone instance and run the import from the structure directory created on the file system by the previous step.
Google docs to Plone or vice versa
There is already an application for integrating google docs and plone at http://pypi.python.org/pypi/collective.googlesharing/1.0.0
However it may be useful to have a one off import / export tool too, but for the moment that may remain on my TODO list until I get a real need to do it.
Changelog for ilrt.contentmigrator
ilrt.contentmigrator - 1.14 Released - (2014-11-09)
- Fixed tests for slight changes in Plone templates
- Created new test account for Google site export - since old one locked
- Fixed relatedItems causing an unecessary log warning on import
ilrt.contentmigrator - 1.13 Released - (2014-10-30)
- Fix the edge case of folders with content in them with id = ‘data’ not being binary objects
ilrt.contentmigrator - 1.12 Released - (2014-04-17)
- Add TEXT_GETTERS config of custom text methods to check for HTML rendering
- Add option to dump out content as HTML, with content type properties as meta tags For use if the site needs to be generated as static HTML for archival etc. NB: This export is not a format for re-import
- Language all specification in query, so foreign content is included
- Make sure properties are reset if empty, so it doesnt get wrongly copied to another object
ilrt.contentmigrator - 1.11 Released - (2013-07-22)
- Add test of textual content types to see if they are actually folders with hidden items (even if they have isPrincipiaFolderish = False) Export any hidden items found to the file system as filename.content
- Update egg dependencies to be compatible with Plone 4.2
ilrt.contentmigrator - 1.10 Released - (2013-05-03)
- Escape HTML body that already contains CDATA tags for XML format export
ilrt.contentmigrator - 1.9 Released - (2012-10-26)
- Close the CDATA tags for XML format export
ilrt.contentmigrator - 1.8 Released - (2012-01-04)
- Manifest failed to include docs directory
ilrt.contentmigrator - 1.7 Released - (2011-12-21)
- Added optional XML format export mode
- Refactored export to use add element function and doc iostream approach to more easily cater for format variations
- Make sure setText is used with text/html mimetype set
ilrt.contentmigrator - 1.6 Released - (2011-02-25)
- Tested against plone 4.0 - fixed tests and added more version compatibility
- Fixed issue with ZMI tabs to use consistent zcml
- Added Google Sites migration utility
ilrt.contentmigrator - 0.7 (2010-04-07 unreleased)
- Move code from svn to mercurial and make public on bitbucket
ilrt.contentmigrator - 0.6 (2010-02-10)
- Replace user export page template xml generation with DTML to be more compatible with old zope
- Fix adding of empty portal.REQUEST attribute causing error with site when exporting users from plone 2
- Use indexObject not reindexObject so modified date is preserved
- Document how to change import path
ilrt.contentmigrator - 0.5 (2009-11-20)
- Export any folderish object by default
- Make user export and reindexing optional
- Just log failed object deletions and continue
- Add the os.O_BINARY flag to all file writing to stop line ending tampering on Windows
- Check for all string types when doing export
ilrt.contentmigrator - 0.4
- Add conversion of old links to archetype links.
- 0.3 release was missing some files, doh!
[Jerry Van Baren]
ilrt.contentmigrator - 0.3
- Bug fixes of utils AT types conversion methods - setting empty dates to current date (hence expiring most content!) and lines fields not converted correctly to tuples.
ilrt.contentmigrator - 0.2
- Contains stand alone ContentMigrator product for exporting content from old plone sites
- Uses generic setup style exportimport/content for importing content
- Imports file and image content
- Sets workflow state of content (requires ilrt.migrationtool)
- Imports users, groups and roles
- Translates old calendar item type to ATEvent
ilrt.contentmigrator - 0.1 Unreleased
- Initial package structure.
- Add Google site announcements and list types export/import to Plone news and custom types
- Fix old plone quick installer addition for the ContentMigrator Product part of the egg
- Add ATReference handling for export / import like GSXML
- Fix AT file field handling to work if object also has default file data attribute
- Add portrait handling to memberdata export / import
- More old plone type translations?
- Adapt to use for old zope only based content, eg. add html filters to grab the body text from ZPT content.