Skip to main content

PDF Thumbnail generation, OCR indexing and extra views integrated with plone.app.async

Project description

Introduction

This package provides some nice integrations for PDF heavy web sites.

  • Generates thumbnails from PDF

  • Adds folder view for pdfs so it can use the generated thumbnail

  • Adds OCR for PDF indexing

  • Everything configurable so you can choose to not use thumbnail gen or OCR

  • Ability to create searchable PDFs with HOCR

  • use the @@async-monitor url to monitor asynchronous jobs that have yet to run

OCR

OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages:

# sudo apt-get install ghostscript tesseract

Searchable PDFs

Requires svn checkout of tesseract version 3.0.1 or 3.0.0 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350

In addition, you’ll need exactimage and pdftk installed

# sudo apt-get install exactimage pdftk

Plone 3

  • Requires hashlib

Extra

You can convert all at once by calling the url @@queue-up-all.

Changelog

0.7b1 ~ 2011-01-06

  • fixes for quality and size issues [vangheem]

0.6b2 ~ 2011-01-04

  • fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem]

0.6b1 ~ 2011-01-04

  • added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version.

0.5b5 ~ 2010-12-07

  • did not conditionally import plone.app.async

0.5b4 ~ 2010-12-06

  • better info on async monitor

  • only reindex searchabletext when doing OCR so the modification date on the object does not get set.

  • make sure to catch exceptions so it doesn’t leave around files after a bad conversion

  • add colorbox for pdf folder view

0.5b3 ~ 2010-12-02

  • add ability to queue up all pdf files

0.5b2 - 2010-12-02

  • fix async monitor view

0.5b1 - 2010-12-02

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildcard.pdfpal-0.7b1.zip (63.2 kB view details)

Uploaded Source

File details

Details for the file wildcard.pdfpal-0.7b1.zip.

File metadata

  • Download URL: wildcard.pdfpal-0.7b1.zip
  • Upload date:
  • Size: 63.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wildcard.pdfpal-0.7b1.zip
Algorithm Hash digest
SHA256 faff5bf5860979fab5b1a5beb380599832a3ac5f8391a5be835a9b3b35dafa14
MD5 13e9b75cd02ad5ce4f2396faf91293a5
BLAKE2b-256 65a10277e2d36ee9f3eb7b91666c81531928d67fa8050a2d2a12d53e5b54cafe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page