This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Introduction

This package provides some nice integrations for PDF heavy web sites.

  • Generates thumbnails from PDF
  • Adds folder view for pdfs so it can use the generated thumbnail
  • Adds OCR for PDF indexing
  • Everything configurable so you can choose to not use thumbnail gen or OCR
  • Ability to create searchable PDFs with HOCR
  • use the @@async-monitor url to monitor asynchronous jobs that have yet to run

OCR

OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages:

# sudo apt-get install ghostscript tesseract-ocr

This will install tessact 2 not tesseract 3.

Searchable PDFs

Requires svn checkout of tesseract version 3.01 or 3.00 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350

In addition, you’ll need exactimage and pdftk installed

# sudo apt-get install exactimage pdftk libtiff-tools

To not use the latest tesseract version to will have to add this in your instances declaration:

environment-vars += AUTHORIZE_OLD_TESSERACT_VERSION true

Plone 3

  • Requires hashlib

Extra

You can convert all at once by calling the url @@queue-up-all.

Changelog

0.7b6 ~ 2012-04-20

-fix uninstall
[vangheem]

0.7b5 ~ 2012-04-19

  • do not run conversion if documentviewer is installed [vangheem]
  • add better uninstall support [vangheem]

0.7b4 ~ 2012-04-09

  • fix image url for album view. [vangheem]

0.7b3 ~ 2012-04-05

  • fix content type spec for thumbnail response [vangheem]
  • display image thumb urls in in album view [vangheem]

0.7b2 ~ 2011-04-12

  • more checks on reading files [vangheem]
  • provide button to manually index document [vangheem]
  • add ability to split pdf up into multiple PDFs [vangheem]

0.7b1 ~ 2011-01-06

  • fixes for quality and size issues [vangheem]

0.6b2 ~ 2011-01-04

  • fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem]

0.6b1 ~ 2011-01-04

  • added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version.

0.5b5 ~ 2010-12-07

  • did not conditionally import plone.app.async

0.5b4 ~ 2010-12-06

  • better info on async monitor
  • only reindex searchabletext when doing OCR so the modification date on the object does not get set.
  • make sure to catch exceptions so it doesn’t leave around files after a bad conversion
  • add colorbox for pdf folder view

0.5b3 ~ 2010-12-02

  • add ability to queue up all pdf files

0.5b2 - 2010-12-02

  • fix async monitor view

0.5b1 - 2010-12-02

  • Initial release
Release History

Release History

0.7b6

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.7b5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.7b4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.7b3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.7b2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.7b1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.6b2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.6b1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5b5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5b4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5b3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5b2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5b1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
wildcard.pdfpal-0.7b6.zip (90.4 kB) Copy SHA256 Checksum SHA256 Source Apr 20, 2012

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting