Skip to main content

Document cloud's document viewer integration into plone.

Project description

Introduction

Produced by wildcardcorp.com

This package integrates documentcloud’s viewer and pdf processing into plone.

Example viewer: https://www.documentcloud.org/documents/19864-goldman-sachs-internal-emails

Features

  • very nice document viewer

  • OCR

  • Searchable on OCR text

  • works with many different document types

  • plone.app.async integration with task monitor

  • lots of configuration options

  • PDF Album view for display groups of PDFs

Works with

Besides displaying PDFs, it will also display:

  • Word

  • Excel

  • Powerpoint

  • HTML

  • RTF

Install requirements

Async Integration

It it highly recommended to install and configure plone.app.async in combination with this package. Doing so will manage all pdf conversions processes asynchronously so the user isn’t delayed so much when saving files.

Settings

The product can be configured via a control panel item Document Viewer Settings.

Some interesting configuration options:

Storage Type

If you want to be able to serve you files via amazon cloud, this will allow you to store the data in flat files that can be synced to another server.

Storage Location

Where are the server to store the files.

OCR

Use tesseract to scan the document for text. This process ca be slow so if your pdfs do not need to be OCR’d, you may disable.

Auto Select Layout

For pdf files added to the site, automatically select the document viewer display.

Auto Convert

When pdf files are added and modified, automatically convert.

Auto layout file types

Types that should automatically be converted to document viewer

File storage integration

If you choose to use basic file storage instead of zodb blob storage, there are a few things you’ll want to keep in mind.

  1. Use nginx to then serve the file system files. This might require you install a local nginx just for serving file storage on the plone server. You can get creative with how your file storage is used though.

  2. Since in plone’s operation, it can be interrupted and the deletion of a file on the OS system system can not be done within a transaction, no files are ever deleted. However, there is an action you can put in a cron task to clean up your file storage directory. Just call the url http://zeoinstace/plone/@@dvcleanup-filestorage.

Upgrading from page turner

If you currently have page turner installed, this project will supercede it. Your page turner views will work but no future files added to the site will be converted to page turner.

To convert existing view, on every page turner enabled file, there will be a button Document Viewer Convert that you can click to manually convert page turner to document viewer.

To convert all existing views, go to portal_setup in the zmi, upgrades, select collective.documentviewer, click to show old upgrades and there should be an upgrade-all step to run.

Upgrading from pdfpal

If you want to upgrade from pdfpal, it is recommended that you simply uninstall pdf pal.

Document viewer will disable parts of pdfpal if installed with it otherwise. If you still want both installed, you’ll need to upgrade to at least version 0.7b5 in order for it to play nice with document viewer.

Also, version 0.7b6 has the best uninstall support so if you’re going to uninstall the product, first upgrade your egg to 0.7b6.

TODO

  • check why there are some error during async operations:
    • ConflictError: database conflict error (oid 0x4d10, class BTrees.IOBTree.IOBucket, serial this txn started with 0x0395f478bc2cb377 2012-04-21 03:36:44.103425, serial currently committed 0x0395f479b09de4cc 2012-04-21 03:37:41.394556)

    • ERROR ZODB.Connection Shouldn’t load state for 0x319d when the connection is closed

Changelog

1.5.0 (2012-04-29)

  • no changes

1.5.0b1 (2012-04-27)

  • be able to move jobs to front of queue

  • use portal_catalog instead of uid_catalog so security checks apply to resource urls.

1.4.2 (2012-04-24)

  • no changes, first final release

1.4.1b3 (2012-04-23)

  • create local catalog and index before syncing db to prevent conflict errors.

  • add redirect timeout to conversion info page

1.4.1b2 (2012-04-23)

  • make sure to close open file descriptors

  • Change “Original Document (PDF)” to “Original Document”

  • emit event after conversion

  • only show queue link if manager

  • convert button should work for files that do not have layout selected yet

  • use communicate instead of wait with popen in case output is large. Prevents deadlocks.

1.4.1b1 (2012-04-23)

  • do not assume pdfpal is used along with pageturner on data conversion.

  • better command runner

  • track errors better and display them in interface if something happened during conversion

  • new file storage structure to prevent too many files from being in one directory

1.4b1 (2012-04-21)

  • fix full screen button when text or pages selected.

  • be able to customize batch size

1.4a2 (2012-04-20)

  • make sure to not use files with spaces

1.4a1 (2012-04-20)

  • be able to detect if pdf already has text in it and do not OCR it if it does.

1.3b2 (2012-04-20)

  • use jQuery instead of $()

1.3b1 (2012-04-20)

  • default OCR to being off since it’s pretty slow

  • better logging when looking for binary files

  • be able to override width of viewer

1.3a3 (2012-04-20)

  • fix uninstall [vangheem]

1.3a2 (2012-04-19)

  • fix async bug if it wasn’t installed [vangheem]

1.3a1 (2012-04-19)

  • make sure to initialize catalog after db sync for large PDFs. [vangheem]

  • better integrate with pdfpal and pageturner so it’s easy to upgrade from those products. [vangheem]

1.2a2 (2012-04-19)

  • fix setting custom quota for async queue [vangheem]

  • fix group view clear button [vangheem]

  • add support for alternative md5sum binary [vangheem]

1.2a1 (2012-04-19)

  • fix full screen page bug [vangheem]

  • better async integration with quota setting [vangheem]

  • View async queue for conversions [vangheem]

  • index ocr data in portal catalog [vangheem]

  • better pdf group view with search [vangheem]

  • handle large files better [vangheem]

  • check if file has already been converted by storing hash of the file to check against. [vangheem]

  • be able to remove document viewer conversion tasks [vangheem]

  • add ability to cleanup file storage files for deleted plone File objects. [vangheem]

1.1a1 (2012-04-18)

  • add pdf folder album view [vangheem]

  • fix async integration [vangheem]

1.0a2 (2012-04-17)

  • add control panel icon [vangheem]

  • fix uninstall procedure [vangheem]

  • changing image type does not cause existing ones to fail. [vangheem]

1.0a1 (2012-04-17)

  • Initial release

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.documentviewer-1.5.0b1.zip (230.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page