Skip to main content

A library for generating preview (thumbnails, text or json overview) for file-based content

Project description

https://travis-ci.org/algoo/preview-generator.svg?branch=master

Presentation

preview-generator is a library for generating preview - thumbnails, pdf, text and json overview for all your file-based content. This module gives you access to jpeg, pdf, text, htlm and json preview of virtually any kind of file. It also includes a cache mechanism so you do not have to care about preview storage.

By creating this module, the goal was to delegate the responsibility of building preview of files managed by tracim.

Supported file formats

Here is an overview of supported file formats:

MIME type Extension
Images - based on WAND (image magick)
application/postscript .ps
image/x-jg .art
image/x-ms-bmp .bmp
text/plain .ksh
image/x-canon-cr2 .cr2
image/x-canon-crw .crw
application/dicom .dcm
application/x-director .dcr
image/x-epson-erf .erf
image/gif .gif
text/x-chdr .h
text/html .htm
image/vnd.microsoft.icon .ico
application/x-info .info
image/x-jng .jng
image/jp2 .jp2
image/jpeg .jpeg
image/jpm .jpm
application/json .json
chemical/x-mopac-input .mop
image/x-nikon-nef .nef
image/x-olympus-orf .orf
application/font-sfnt .otf
image/x-portable-bitmap .pbm
image/pcx .pcx
chemical/x-pdb .pdb
application/pdf .pdf
application/x-font .pfa
image/x-portable-graymap .pgm
image/png .png
image/x-portable-anymap .pnm
image/x-portable-pixmap .ppm
image/x-photoshop .psd
image/x-cmu-raster .ras
image/x-rgb .rgb
application/x-silverlight .scr
text/scriptlet .sct
image/tiff .tiff
application/vnd.visio .vsd
image/vnd.wap.wbmp .wbmp
image/x-xbitmap .xbm
application/x-xcf .xcf
image/x-xpixmap .xpm
image/x-xwindowdump .xwd
Bitmap images - based on Pillow
image/png .png
application/postscript .ps
image/x-eps
Images - based on convert command (Image magick)
application/postscript .ps
image/x-jg .art
image/x-ms-bmp .bmp
text/plain .ksh
image/x-canon-cr2 .cr2
image/x-canon-crw .crw
application/dicom .dcm
application/x-director .dcr
image/x-epson-erf .erf
image/gif .gif
text/x-chdr .h
text/html .htm
image/vnd.microsoft.icon .ico
application/x-info .info
image/x-jng .jng
image/jp2 .jp2
image/jpeg .jpeg
image/jpm .jpm
application/json .json
chemical/x-mopac-input .mop
image/x-nikon-nef .nef
image/x-olympus-orf .orf
application/font-sfnt .otf
image/x-portable-bitmap .pbm
image/pcx .pcx
chemical/x-pdb .pdb
application/pdf .pdf
application/x-font .pfa
image/x-portable-graymap .pgm
image/png .png
image/x-portable-anymap .pnm
image/x-portable-pixmap .ppm
image/x-photoshop .psd
image/x-cmu-raster .ras
image/x-rgb .rgb
application/x-silverlight .scr
text/scriptlet .sct
image/tiff .tiff
application/vnd.visio .vsd
image/vnd.wap.wbmp .wbmp
image/x-xbitmap .xbm
application/x-xcf .xcf
image/x-xpixmap .xpm
image/x-xwindowdump .xwd
Archive files
application/x-compressed
application/x-zip-compressed
application/zip .zip
multipart/x-zip
application/x-tar .tar
application/x-gzip
application/x-gtar .gtar
application/x-tgz
Vector images - based on Inkscape
image/svg+xml .svg
Documents - based on LibreOffice
image/wmf
application/x-hwp .hwp
application/x-aportisdoc
application/vnd.sun.xml.chart
application/vnd.ms-excel.sheet.binary.macroEnabled.12
application/docbook+xml
application/vnd.sun.xml.writer.global .sxg
image/x-xpixmap .xpm
application/x-gnumeric .gnumeric
application/vnd.apple.pages
image/x-emf
application/vnd.stardivision.calc .sdc
text/spreadsheet
application/mathml+xml
image/x-sgf
application/x-sony-bbeb
image/x-portable-graymap .pgm
application/wps-office.doc
application/x-starwriter
application/vnd.oasis.opendocument.spreadsheet .ods
application/clarisworks
application/vnd.sun.xml.impress .sxi
application/x-iwork-numbers-sffnumbers
application/vnd.ms-powerpoint.slide.macroEnabled.12
application/vnd.oasis.opendocument.text-master .odm
application/vnd.sun.xml.writer.template .stw
application/x-iwork-pages-sffpages
application/x-iwork-keynote-sffkey
application/vnd.oasis.opendocument.graphics-flat-xml .fodg
application/vnd.openxmlformats-officedocument.presentationml.slideshow .ppsx
application/x-abiword .abw
image/x-targa
application/xhtml+xml .xhtml
application/vnd.ms-excel .xls
image/x-photo-cd
application/vnd.stardivision.draw .sda
image/x-portable-bitmap .pbm
application/visio.drawing
application/vnd.oasis.opendocument.graphics .odg
image/vnd.adobe.photoshop
application/vnd.sun.xml.calc.template .stc
application/vnd.lotus-1-2-3
application/vnd.sun.xml.writer.web
application/vnd.oasis.opendocument.database
image/cgm
application/vnd.sun.xml.math .sxm
application/vnd.openxmlformats-officedocument.presentationml.template .potx
application/rtf .rtf
application/vnd.apple.keynote
image/x-wpg
application/vnd.ms-excel.template.macroEnabled.12
application/x-pagemaker
application/vnd.ms-powerpoint .ppt
application/x-mspublisher
application/vnd.visio .vsd
application/vnd.oasis.opendocument.presentation .odp
application/vnd.sun.xml.writer .sxw
application/wps-office.ppt
application/vnd.sun.xml.calc .sxc
image/x-pict
application/vnd.sun.xml.impress.template .sti
application/wps-office.pptx
image/x-sun-raster
image/x-freehand
application/prs.plucker
application/x-pocket-word
text/csv .csv
application/vnd.openxmlformats-officedocument.presentationml.presentation .pptx
image/x-wmf
application/vnd.sun.xml.draw .sxd
application/vnd.oasis.opendocument.presentation-flat-xml .fodp
text/html .htm
application/vnd.oasis.opendocument.graphics-template .otg
application/vnd.oasis.opendocument.spreadsheet-flat-xml .fods
application/vnd.corel-draw
application/x-qpro
application/vnd.ms-excel.sheet.macroEnabled.12
application/vnd.visio.xml
image/x-pcx
image/x-svm
application/vnd.ms-word.template.macroEnabled.12
application/vnd.oasis.opendocument.chart .odc
application/x-fictionbook+xml
application/msword .dot
application/vnd.oasis.opendocument.text .odt
application/vnd.ms-works
image/vnd.dxf
application/vnd.oasis.opendocument.text-web .oth
application/vnd.openxmlformats-officedocument.wordprocessingml.template .dotx
image/x-eps
application/vnd.stardivision.writer .sdw
text/rtf
application/vnd.oasis.opendocument.formula .odf
application/x-stardraw
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet .xlsx
application/vnd.ms-powerpoint.presentation.macroEnabled.12
application/wordperfect5.1
image/emf
application/x-starcalc
application/vnd.oasis.opendocument.text-master-template .otm
application/vnd.oasis.opendocument.text-template .ott
application/vnd.palm
application/vnd.oasis.opendocument.base
application/wps-office.docx
application/x-t602
application/vnd.openxmlformats-officedocument.wordprocessingml.document .docx
image/x-xbitmap .xbm
application/vnd.oasis.opendocument.formula-template
application/vnd.oasis.opendocument.presentation-template .otp
application/vnd.oasis.opendocument.chart-template
image/x-met
application/macwriteii
application/x-dbase
image/tiff .tiff
application/vnd.oasis.opendocument.spreadsheet-template .ots
application/vnd.sun.xml.draw.template .std
application/wps-office.xls
application/vnd.wordperfect .wpd
application/vnd.ms-powerpoint.slideshow.macroEnabled.12
application/vnd.openxmlformats-officedocument.spreadsheetml.template .xltx
application/vnd.openxmlformats-officedocument.presentationml.slide .sldx
image/x-portable-pixmap .ppm
application/vnd.visio2013
image/x-cmx
application/vnd.sun.xml.base .odb
application/wps-office.xlsx
application/vnd.oasis.opendocument.text-flat-xml .fodt
image/x-cmu-raster .ras
application/vnd.apple.numbers
application/vnd.ms-powerpoint.template.macroEnabled.12
image/tif
application/vnd.lotus-wordpro
application/vnd.ms-word.document.macroEnabled.12
Plain text files
text/plain .ksh
text/html .htm
application/xml .xsl
application/javascript .js
PDF documents - based on PyPDF2
application/pdf .pdf

Installation

pip install preview-generator

Note about requirements: some packages are needed for installing python. If the pip install preview-generator command fails, try to install zlib and libjpeg dev packages. On debian-based OSes this can be done through the following command:

apt-get install zlib1g-dev libjpeg-dev

This package uses the following python dependencies (this list is not exhaustive): wand, python-magick, pillow, PyPDF2.

Note: if you want to preview office files, ensure that LibreOffice is installed on your computer.

Usage

Here are some examples of code

Generate a thumbnail of an image file

from preview_generator.manager import PreviewManager
manager = PreviewManager('/tmp/cache/', create_folder= True)
thumbnail_file_path = manager.get_jpeg_preview('/home/user/Pictures/myfile.gif', height=100, width=200)
print('Preview created at path : ', thumbnail_file_path)

Generate a pdf preview of a libreoffice text document

from preview_generator.manager import PreviewManager
manager = PreviewManager('/tmp/cache/', create_folder= True)
pdf_file_path = manager.get_pdf_preview('/home/user/Documents/report.odt', page=2)
print('Preview created at path : ', thumbnail_file_path)

The preview manager

preview_manager = PreviewManager(cache_path)

args :

cache_path : a String of the path to the directory where the cache file will be stored create_folder : a boolean, when True will TRY to create the cache folder

returns :

a PreviewManager Object

The builders

Here is the way it is meant to be used assuming that cache_path is an existing directory

For Office types into PDF :

preview_manager = PreviewManager(cache_path)
preview = preview_manager.get_pdf_preview(file_path,page=page_id)

-> Will create a preview from an office file into a pdf file

args :

file_path : the String of the path where is the file you want to get the preview

page : the int of the page you want to get. If not mentioned all the pages will be returned. First page is page 0

use_original_filename : a boolean that mention if the original file name should appear in the preview name. True by default

returns :

a FileIO stream of bytes of the pdf preview

For images(GIF, BMP, PNG, JPEG, PDF) into jpeg :

preview_manager = PreviewManager(cache_path)
preview = preview_manager.get_jpeg_preview(file_path,height=1024,width=526)

-> Will create a preview from an image file into a jpeg file of size 1024 * 526

args :

file_path : the String of the path where is the file you want to get the preview

height : height of the preview in pixels

width : width of the preview in pixels. If not mentioned, width will be the same as height

use_original_filename : a boolean that mention if the original file name should appear in the preview name. True by default

returns :

a FileIO stream of bytes of the jpeg preview

Other conversions :

The principle is the same as above

Zip to text or html : will build a list of files into texte/html inside the json

Office to jpeg : will build the pdf out of the office file and then build the jpeg.

Text to text : mainly just a copy stored in the cache

Cache mechanism

Naming :

The name of the preview generated in the cache directory will be :

{file_name}-[{size}-]{file_md5sum}[({page})]{extension}

file_name = the name of the file you asked for a preview without the extension.

size = the size you asked for the preview. In case of a Jpeg preview.

file_md5sum = the md5sum of the entire path of the file. To avoid conflicts like files that have the same name but are in different directory.

page = the page asked in case of pdf or office document preview.

extensions = the extension of the preview (.jpeg for a jpeg, .txt for a text, etc)

Example :

These scripts :

GIF to JPEG :

import os
from preview-generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'

manager = PreviewManager(path=current_dir + 'cache')
path_to_file = manager.get_jpeg_preview(
    file_path=current_dir + 'the_gif.gif',
    height=512,
    width=512,
)

print('Preview created at path : ', path_to_file)

will print

Preview created at path : the_gif-512x512-60dc9ef46936cc4fff2fe60bb07d4260.jpeg

ODT to JPEG :

import os
from preview-generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'

manager = PreviewManager(path=current_dir + 'cache')
path_to_file = manager.get_jpeg_preview(
    file_path=current_dir + 'the_odt.odt',
    page=1,
    height=1024,
    width=1024,
)

print('Preview created at path : ', path_to_file)

will print

Preview created at path : the_odt-1024x1024-c8b37debbc45fa96466e5e1382f6bd2e-page1.jpeg

ZIP to Text :

import os
from preview-generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'

manager = PreviewManager(path=current_dir + 'cache')
path_to_file = manager.get_text_preview(
    file_path=current_dir + 'the_zip.zip',
)

print('Preview created at path : ', path_to_file)

will print

Preview created at path : the_zip-a733739af8006558720be26c4dc5569a.txt

Adding new feature :

Before all, I’d be glad if you could share your new feature with everybody. So if you want to, you can fork it on github ( https://github.com/algoo/preview-generator) (see Developer’s Kit) and submit new features.

If you want to add a new preview builder to handle documents of type foo into jpeg (for example) here is how to proceed :

  • Warning If you need to look at other builders to find out how to proceed, avoid looking at any of the Office to something. It is a particular case and could misslead you.
  • Create a new class FooPreviewBuilder in a file foo_preview.py in preview-generator/preview
  • Make him inherit from the logical PreviewBuilder class
    • if it handles several pages it will be class FooPreviewBuilder(PreviewBuilder)
    • for single page it will be class FooPreviewBuilder(OnePagePreviewBuilder)
  • define your own build_jpeg_preview(…) (in the case we want to make foo into jpeg) based on the same principle as other build_{type}_preview(…)
  • Inside this build_jpeg_preview(…) you will call a method file_converter.foo_to_jpeg(…)
  • Define your foo_to_jpeg(…) method in preview-generator.file_converter.py
    • inputs must be a stream of bytes and optional informations like a number of pages, a size, …
    • output must also be a stream of bytes
  • Maybe you’ll need to redefine some methods like get_page_number() or exists_preview() in your FooPreviewBuilder class

Developer’s Kit

Installation (dev) :

From scratch on a terminal :
  • create your project directory (we will name it “the_project” but you can name it the way you want) : mkdir the_project

  • cd the_project

  • git clone https://github.com/algoo/preview-generator

  • building your environment :
    • install python virtualenv builder : sudo apt install python3-venv
    • build your virtual env (I can say that it work with python 3.4 but did not try with other versions)(env will be called “myenv”, you can name it the way you want): python3.4 -m venv myenv
    • if it’s not already, activate it : source myenv/bin/activate. (deactivate to deactivate)
  • install dependencies :

    • Exiftool - Follow instruction on the main website: https://sno.phy.queensu.ca/~phil/exiftool/
    • apt-get install zlib1g-dev
    • apt-get install libjpeg-dev
    • apt-get install python3-pythonmagick
    • apt-get install inkscape
    • apt-get install xvfb
    • apt-get install poppler-utils
    • apt-get install libfile-mimeinfo-perl
    • pip install wand
    • pip install Pillow
    • pip install PyPDF2
    • pip install python-magic
    • pip install pyexifinfo
    • pip install packaging
    • pip install xvfbwrapper
    • pip install pdf2image
    • pip install pathlib
    • if you use python 3.5 or less pip install typing
# general dependencies
apt-get install libjpeg-dev libjpeg-dev python3-pythonmagick inkscape xvfb
pip install wand Pillow PyPDF2 python-magic pyexifinfo packaging xvfbwrapper pdf2image pathlib
# Exiftool
wget https://sno.phy.queensu.ca/~phil/exiftool/Image-ExifTool-11.11.tar.gz
gzip -dc Image-ExifTool-11.11.tar.gz | tar -xf -
cd Image-ExifTool-11.11
perl Makefile.PL
sudo make install

If you need to preview scribus .sla files you will need scribus >= 1.5. If it’s not available in your distribution you can use an AppImage.

Download the last AppImage from the official website https://www.scribus.net/downloads/unstable-branch/

mv /path/to/image/scribus-x.y.appimage /usr/local/bin/scribus
chmod +x /usr/local/bin/scribus

Running Pytest :

Pytest is a motor for unit testing
  • pip install pytest
  • go into the “tests” directory : cd path/to/you/project/directory/tests
  • run py.test

License

MIT licensed. https://opensource.org/licenses/MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
preview_generator-0.9.tar.gz (50.7 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page