Skip to main content

A Plone 4 product that generates image thumbnail previews of PDF files stored on ATFile based objects.

Project description

Introduction

PdfPeek is a Plone 4 add-on product that utilizes GNU Ghostscript to generate image thumbnail previews of PDF files uploaded to ATFile based content objects.

This product, when installed in a Plone 4.x site, will automatically generate preview and thumbnail images of each page of uploaded PDF files and store them annotated onto the content object containing the PDF file.

Image generation from the PDF file is processed asynchronously so that the user does not have to wait for the images to be created in order to continue using the site, as the processing of large PDF files can take many minutes to complete.

When a file object is initialized or edited, PdfPeek checks to see if a PDF file was uploaded. If so, a ghostscript image conversion job is added to the pdfpeek job queue. If the file uploaded is not of content type ‘application/pdf’, an image removal job is added to the pdfpeek job queue. This job queue is processed periodically by a cron job or a zope clock server process. The image conversion jobs add the IPDF interface to the content object and store the resulting image preview and thumbnail for each page of the PDF annotated on to the content object itself. The image removal jobs remove the image annotations and the IPDF interface from the content object.

If a job fails, it is removed from the processing queue and appended to a list of failed jobs. If a job succeeds, it is removed from the processing queue and appended to a list of successfully completed jobs.

PdfPeek ships with an example user interface that is turned on by default. This UI displays the thumbnail images of each page of the PDF file when a user views the content object in their browser. This example UI is not quite working yet, and is meant to be just that, an example. I don’t claim to be a javascript master.

A custom traverser is available to make it easy to access the images and previews directly, as well as to build custom views incorporating image previews of file content.

PdfPeek ships with a configlet that allows the site administrator to adjust the size of the generated preview and thumbnail images, as well as toggle the example user interface and default event handlers on and off.

Requires the GNU ghostscript gs binary to be available on the $PATH!

Tested on POSIX compliant systems such as LINUX and MacOS 10.6. Untested on Windows systems. (Wouldn’t be surprised if it works, as long as you can install gs.)

As of version 0.17, Plone 3.x is no longer officially supported.

Installation

Via zc.buildout

The recommended method of using collective.pdfpeek is by installing via zc.buildout using the plone.recipe.zope2instance recipe. PdfPeek uses z3c.autoinclude to load it’s zcml, so you don’t need a zcml slug.

Add collective.pdfpeek to the list of eggs in the instance section of your buildout.cfg like so:

[instance]
...
eggs =
    ...
    collective.pdfpeek
    ...

Then re-run your buildout like so to activate the installation:

$ bin/buildout

Via setuptools

To install collective.pdfpeek into the global Python environment (or a virtualenv), using a traditional Zope 2 instance, you can do this:

  • When you’re reading this you have probably already run easy_install collective.pdfpeek. Find out how to install setuptools (and EasyInstall) here: http://peak.telecommunity.com/DevCenter/EasyInstall

  • If you are using Zope 2.9 (not 2.10), get pythonproducts and install it via:

    python setup.py install --home /path/to/instance

into your Zope instance.

  • Create a file called collective.pdfpeek-configure.zcml in the /path/to/instance/etc/package-includes directory. The file should only contain this:

    <include package="collective.pdfpeek" />

Configuration

Via zc.buildout

For automatic processing of the PdfPeek job queue, a simple cron script using curl or wget would suffice. It is nice to keep all of the configuration for a project in your buildout, however. For this reason, a zope clock server process is the recommended way to automatically process the job queue. You can do so by adding the following snippet to your [instance] part in your buildout configuration:

[instance]
...
zope-conf-additional=
    # process the job queue every 5 seconds
    <clock-server>
       method /Plone/@@pdfpeek.utils/process_conversion_queue
       period 5
       user admin
       password admin
       host localhost
    </clock-server>
...

You will have to edit the above snippet to customize the name of the plone site, the admin username and password, and the hostname the instance is running on. You can also adjust the interval at which the queue is processed by the clock server.

Then re-run your buildout like so to activate the clock server:

$ bin/buildout

Via cron

Install wget.

Edit your crontab file and append the following line:

5 * * * * wget --user=admin --password=admin http://localhost:8080/Plone/@@pdfpeek.utils/process_conversion_queue

You will have to customize the above line with the hostname, port number, username, password and path to your plone instance.

Save your crontab file and wget will now call the view method that triggers the processing of the pdf conversion queue every five minutes.

Changelog

0.19 (2010-4-8)

  • Modified transform to use cStringIO instead of StringIO, in the hopes of making things more efficient. [dbrenneman]

  • Modified conversion function to grab file data from object using getFile method, as this is the proper way of doing things… [dbrenneman]

0.18 (2010-2-26)

  • Fixed bug in reST rendering of changelog. [dbrenneman]

0.17 (2010-2-26)

  • Added wide variety of pdf files to run through the unit tests for the ghostscript image transform. [dbrenneman]

  • Added unit tests for low level ghostscript transform. [dbrenneman]

  • Refactored transform code to make class and method names make more sense. [dbrenneman]

  • Updated README, including instructions for configuring the clock server. [dbrenneman]

  • Added asyncronous processing queue for ghostscript transform jobs. [dbrenneman]

  • Updated functional doctests to work on Plone 4 with blobfile storage. [dbrenneman]

  • Updated functional doctests to test transform queue. [dbrenneman]

  • Updated documentation. [dbrenneman]

  • Added unit testing harness. [dbrenneman]

0.16 (2009-12-12)

  • Bugfix release. [dbrenneman]

0.15 (2009-12-12)

  • Added configurable preview and thumbnail sizes. [claytron]

  • reST police! Fixing up the docs so that they might get rendered correctly. [claytron]

0.13 (2009-11-12)

  • Refactored transform code to deal with encrypted pdf files better. [dbrenneman]

  • Made transform code more robust. [dbrenneman]

  • Added ability to toggle default event handler on and off. [dbrenneman]

0.12 (2009-10-25)

  • Bugfix release. [dbrenneman]

0.11 (2009-10-25)

  • Bugfix release. [dbrenneman]

0.10 (2009-10-25)

  • Added code to check for EOF at the end of the pdf file data string and to insert one if it is not there. Fixes many corrupt pdf files. [dbrenneman]

0.9 (2009-10-13)

  • Fixed another bug in the transform code to allow functioning with any filefield, as long as it is called file. [dbrenneman]

0.8 (2009-10-13)

  • Fixed a bug in the transform code to allow functioning with any filefield, as long as it is called file. [dbrenneman]

0.7 (2009-10-13)

  • Streamlined transform code. [dbrenneman]

  • Added ability to toggle the pdfpeek viewlet display on and off via configlet. [dbrenneman]

0.6 (2009-10-05)

  • Bugfix release. [dbrenneman]

0.5 (2009-10-05)

  • Added control panel configlet. [dbrenneman]

  • Removed unneeded xml files from uninstall profile. [dbrenneman]

  • Optimized transform. [dbrenneman]

  • Added storage of image thumbnail along with image, generated with PIL. [dbrenneman]

  • Changed annotation to store images in a dict instead of a list. [dbrenneman]

  • Changed event handler to listen on all AT based objects instead of ATFile. [dbrenneman]

  • Added custom pdfpeek icon for configlet. [dbrenneman]

  • Added custom traverser to allow easy access to the OFS.Image.Image() objects stored on IPDF objects. [dbrenneman]

  • Modified pdfpeek viewlet code to display images using the custom traverser. [dbrenneman]

  • Added custom scrollable gallery with tooltips using jQuery Tools to the pdfpeek viewlet for display. [dbrenneman]

0.4 (2009-10-01)

  • Refactored storage to use OFS.Image.Image() objects instead of storing the raw binary data in string format. [dbrenneman]

  • Refactored event handler object variable name. [dbrenneman]

  • Removed unneeded files from default GS Ext. profile. [dbrenneman]

  • Removed unneeded javascript files and associated images and css. [dbrenneman]

0.3 - 2009-08-03

  • fixed parsing of pdf files with multiple pages [piv]

0.1 - Unreleased

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.pdfpeek-0.19.tar.gz (11.1 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page