A library for generating preview (thumbnails, text or json overview) for file-based content
Project description
Presentation
preview-generator is a library for generating preview - thumbnails, pdf, text and json overview for all your file-based content. This module gives you access to jpeg, pdf, text, html and json preview of virtually any kind of file. It also includes a cache mechanism so you do not have to care about preview storage.
By creating this module, the goal was to delegate the responsibility of building preview of files managed by tracim.
Supported file formats
Here is an overview of supported file formats:
MIME type |
Extension |
---|---|
Images - based on WAND (image magick) |
|
application/postscript |
.ps |
image/x-jg |
.art |
image/x-ms-bmp |
.bmp |
text/plain |
.ksh |
image/x-canon-cr2 |
.cr2 |
image/x-canon-crw |
.crw |
application/dicom |
.dcm |
application/x-director |
.dcr |
image/x-epson-erf |
.erf |
image/gif |
.gif |
text/x-chdr |
.h |
text/html |
.htm |
image/vnd.microsoft.icon |
.ico |
application/x-info |
.info |
image/x-jng |
.jng |
image/jp2 |
.jp2 |
image/jpeg |
.jpeg |
image/jpm |
.jpm |
application/json |
.json |
chemical/x-mopac-input |
.mop |
image/x-nikon-nef |
.nef |
image/x-olympus-orf |
.orf |
application/font-sfnt |
.otf |
image/x-portable-bitmap |
.pbm |
image/pcx |
.pcx |
chemical/x-pdb |
.pdb |
application/pdf |
|
application/x-font |
.pfa |
image/x-portable-graymap |
.pgm |
image/png |
.png |
image/x-portable-anymap |
.pnm |
image/x-portable-pixmap |
.ppm |
image/x-photoshop |
.psd |
image/x-cmu-raster |
.ras |
image/x-rgb |
.rgb |
application/x-silverlight |
.scr |
text/scriptlet |
.sct |
image/tiff |
.tiff |
application/vnd.visio |
.vsd |
image/vnd.wap.wbmp |
.wbmp |
image/x-xbitmap |
.xbm |
application/x-xcf |
.xcf |
image/x-xpixmap |
.xpm |
image/x-xwindowdump |
.xwd |
Bitmap images - based on Pillow |
|
image/png |
.png |
application/postscript |
.ps |
image/x-eps |
|
Images - based on convert command (Image magick) |
|
application/postscript |
.ps |
image/x-jg |
.art |
image/x-ms-bmp |
.bmp |
text/plain |
.ksh |
image/x-canon-cr2 |
.cr2 |
image/x-canon-crw |
.crw |
application/dicom |
.dcm |
application/x-director |
.dcr |
image/x-epson-erf |
.erf |
image/gif |
.gif |
text/x-chdr |
.h |
text/html |
.htm |
image/vnd.microsoft.icon |
.ico |
application/x-info |
.info |
image/x-jng |
.jng |
image/jp2 |
.jp2 |
image/jpeg |
.jpeg |
image/jpm |
.jpm |
application/json |
.json |
chemical/x-mopac-input |
.mop |
image/x-nikon-nef |
.nef |
image/x-olympus-orf |
.orf |
application/font-sfnt |
.otf |
image/x-portable-bitmap |
.pbm |
image/pcx |
.pcx |
chemical/x-pdb |
.pdb |
application/pdf |
|
application/x-font |
.pfa |
image/x-portable-graymap |
.pgm |
image/png |
.png |
image/x-portable-anymap |
.pnm |
image/x-portable-pixmap |
.ppm |
image/x-photoshop |
.psd |
image/x-cmu-raster |
.ras |
image/x-rgb |
.rgb |
application/x-silverlight |
.scr |
text/scriptlet |
.sct |
image/tiff |
.tiff |
application/vnd.visio |
.vsd |
image/vnd.wap.wbmp |
.wbmp |
image/x-xbitmap |
.xbm |
application/x-xcf |
.xcf |
image/x-xpixmap |
.xpm |
image/x-xwindowdump |
.xwd |
Archive files |
|
application/x-compressed |
|
application/x-zip-compressed |
|
application/zip |
.zip |
multipart/x-zip |
|
application/x-tar |
.tar |
application/x-gzip |
|
application/x-gtar |
.gtar |
application/x-tgz |
|
Vector images - based on Inkscape |
|
image/svg+xml |
.svg |
Documents - based on LibreOffice |
|
image/wmf |
|
application/x-hwp |
.hwp |
application/x-aportisdoc |
|
application/vnd.sun.xml.chart |
|
application/vnd.ms-excel.sheet.binary.macroEnabled.12 |
|
application/docbook+xml |
|
application/vnd.sun.xml.writer.global |
.sxg |
image/x-xpixmap |
.xpm |
application/x-gnumeric |
.gnumeric |
application/vnd.apple.pages |
|
image/x-emf |
|
application/vnd.stardivision.calc |
.sdc |
text/spreadsheet |
|
application/mathml+xml |
|
image/x-sgf |
|
application/x-sony-bbeb |
|
image/x-portable-graymap |
.pgm |
application/wps-office.doc |
|
application/x-starwriter |
|
application/vnd.oasis.opendocument.spreadsheet |
.ods |
application/clarisworks |
|
application/vnd.sun.xml.impress |
.sxi |
application/x-iwork-numbers-sffnumbers |
|
application/vnd.ms-powerpoint.slide.macroEnabled.12 |
|
application/vnd.oasis.opendocument.text-master |
.odm |
application/vnd.sun.xml.writer.template |
.stw |
application/x-iwork-pages-sffpages |
|
application/x-iwork-keynote-sffkey |
|
application/vnd.oasis.opendocument.graphics-flat-xml |
.fodg |
application/vnd.openxmlformats-officedocument.presentationml.slideshow |
.ppsx |
application/x-abiword |
.abw |
image/x-targa |
|
application/xhtml+xml |
.xhtml |
application/vnd.ms-excel |
.xls |
image/x-photo-cd |
|
application/vnd.stardivision.draw |
.sda |
image/x-portable-bitmap |
.pbm |
application/visio.drawing |
|
application/vnd.oasis.opendocument.graphics |
.odg |
image/vnd.adobe.photoshop |
|
application/vnd.sun.xml.calc.template |
.stc |
application/vnd.lotus-1-2-3 |
|
application/vnd.sun.xml.writer.web |
|
application/vnd.oasis.opendocument.database |
|
image/cgm |
|
application/vnd.sun.xml.math |
.sxm |
application/vnd.openxmlformats-officedocument.presentationml.template |
.potx |
application/rtf |
.rtf |
application/vnd.apple.keynote |
|
image/x-wpg |
|
application/vnd.ms-excel.template.macroEnabled.12 |
|
application/x-pagemaker |
|
application/vnd.ms-powerpoint |
.ppt |
application/x-mspublisher |
|
application/vnd.visio |
.vsd |
application/vnd.oasis.opendocument.presentation |
.odp |
application/vnd.sun.xml.writer |
.sxw |
application/wps-office.ppt |
|
application/vnd.sun.xml.calc |
.sxc |
image/x-pict |
|
application/vnd.sun.xml.impress.template |
.sti |
application/wps-office.pptx |
|
image/x-sun-raster |
|
image/x-freehand |
|
application/prs.plucker |
|
application/x-pocket-word |
|
text/csv |
.csv |
application/vnd.openxmlformats-officedocument.presentationml.presentation |
.pptx |
image/x-wmf |
|
application/vnd.sun.xml.draw |
.sxd |
application/vnd.oasis.opendocument.presentation-flat-xml |
.fodp |
text/html |
.htm |
application/vnd.oasis.opendocument.graphics-template |
.otg |
application/vnd.oasis.opendocument.spreadsheet-flat-xml |
.fods |
application/vnd.corel-draw |
|
application/x-qpro |
|
application/vnd.ms-excel.sheet.macroEnabled.12 |
|
application/vnd.visio.xml |
|
image/x-pcx |
|
image/x-svm |
|
application/vnd.ms-word.template.macroEnabled.12 |
|
application/vnd.oasis.opendocument.chart |
.odc |
application/x-fictionbook+xml |
|
application/msword |
.dot |
application/vnd.oasis.opendocument.text |
.odt |
application/vnd.ms-works |
|
image/vnd.dxf |
|
application/vnd.oasis.opendocument.text-web |
.oth |
application/vnd.openxmlformats-officedocument.wordprocessingml.template |
.dotx |
image/x-eps |
|
application/vnd.stardivision.writer |
.sdw |
text/rtf |
|
application/vnd.oasis.opendocument.formula |
.odf |
application/x-stardraw |
|
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
.xlsx |
application/vnd.ms-powerpoint.presentation.macroEnabled.12 |
|
application/wordperfect5.1 |
|
image/emf |
|
application/x-starcalc |
|
application/vnd.oasis.opendocument.text-master-template |
.otm |
application/vnd.oasis.opendocument.text-template |
.ott |
application/vnd.palm |
|
application/vnd.oasis.opendocument.base |
|
application/wps-office.docx |
|
application/x-t602 |
|
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
.docx |
image/x-xbitmap |
.xbm |
application/vnd.oasis.opendocument.formula-template |
|
application/vnd.oasis.opendocument.presentation-template |
.otp |
application/vnd.oasis.opendocument.chart-template |
|
image/x-met |
|
application/macwriteii |
|
application/x-dbase |
|
image/tiff |
.tiff |
application/vnd.oasis.opendocument.spreadsheet-template |
.ots |
application/vnd.sun.xml.draw.template |
.std |
application/wps-office.xls |
|
application/vnd.wordperfect |
.wpd |
application/vnd.ms-powerpoint.slideshow.macroEnabled.12 |
|
application/vnd.openxmlformats-officedocument.spreadsheetml.template |
.xltx |
application/vnd.openxmlformats-officedocument.presentationml.slide |
.sldx |
image/x-portable-pixmap |
.ppm |
application/vnd.visio2013 |
|
image/x-cmx |
|
application/vnd.sun.xml.base |
.odb |
application/wps-office.xlsx |
|
application/vnd.oasis.opendocument.text-flat-xml |
.fodt |
image/x-cmu-raster |
.ras |
application/vnd.apple.numbers |
|
application/vnd.ms-powerpoint.template.macroEnabled.12 |
|
image/tif |
|
application/vnd.lotus-wordpro |
|
application/vnd.ms-word.document.macroEnabled.12 |
|
Plain text files |
|
text/plain |
.ksh |
text/html |
.htm |
application/xml |
.xsl |
application/javascript |
.js |
PDF documents - based on PyPDF2 |
|
application/pdf |
Installation
Dependencies:
apt-get install zlib1g-dev libjpeg-dev python3-pythonmagick inkscape xvfb poppler-utils libfile-mimeinfo-perl qpdf
At the moment there are issues with the exiftool package on debian, so you’ll need to install it manually:
# Exiftool
wget https://sno.phy.queensu.ca/~phil/exiftool/Image-ExifTool-11.11.tar.gz
gzip -dc Image-ExifTool-11.11.tar.gz | tar -xf -
cd Image-ExifTool-11.11
perl Makefile.PL
sudo make install
After installing dependencies, you can install preview-generator using pip:
pip install preview-generator
Optional dependencies:
To handle previews for office documents you will need LibreOffice, if you don’t have it already:
apt-get install libreoffice
To check dependencies, you can run:
preview --check-dependencies
Usage
Here are some examples of code
Basic Usage
Most basic usage, create a jpeg from a png, default size 256x256
from preview_generator.manager import PreviewManager
cache_path = '/tmp/preview_cache'
file_to_preview_path = '/tmp/an_image.png'
manager = PreviewManager(cache_path, create_folder= True)
path_to_preview_image = manager.get_jpeg_preview(file_to_preview_path)
Preview an image with a specific size
You can choose the size of your image using params width and height.
from preview_generator.manager import PreviewManager
cache_path = '/tmp/preview_cache'
file_to_preview_path = '/tmp/an_image.png'
manager = PreviewManager(cache_path, create_folder= True)
path_to_preview_image = manager.get_jpeg_preview(file_to_preview_path, width=1000, height=500)
Preview a pdf or an office document as a jpeg
from preview_generator.manager import PreviewManager
cache_path = '/tmp/preview_cache'
pdf_or_odt_to_preview_path = '/tmp/a_pdf.pdf'
manager = PreviewManager(cache_path, create_folder= True)
path_to_preview_image = manager.get_jpeg_preview(pdf_or_odt_to_preview_path)
By default it will generate the preview of the first page of the document. Using params page, you can you pick the page you want to preview.
page number starts at 0, if you want to preview the second page of your document then the argument will be 1 `page=1`
from preview_generator.manager import PreviewManager
cache_path = '/tmp/preview_cache'
pdf_or_odt_to_preview_path = '/tmp/a_pdf.pdf'
manager = PreviewManager(cache_path, create_folder= True)
path_to_preview_image = manager.get_jpeg_preview(pdf_or_odt_to_preview_path, page=1)
Generate a pdf preview of a libreoffice text document
from preview_generator.manager import PreviewManager
manager = PreviewManager('/tmp/cache/', create_folder= True)
pdf_file_path = manager.get_pdf_preview('/home/user/Documents/report.odt', page=2)
print('Preview created at path : ', thumbnail_file_path)
For Office types into PDF :
cache_path = '/tmp/previews'
preview_manager = PreviewManager(cache_path, create_folder= True)
path_to_preview = preview_manager.get_pdf_preview(file_path,page=page_id)
-> Will create a preview from an office file into a pdf file
args :
file_path : the String of the path where is the file you want to get the preview
page : the int of the page you want to get. If not mentioned all the pages will be returned. First page is page 0
returns :
str: path to the preview file
For images(GIF, BMP, PNG, JPEG, PDF) into jpeg :
cache_path = '/tmp/previews'
preview_manager = PreviewManager(cache_path, create_folder= True)
path_to_preview = preview_manager.get_jpeg_preview(file_path,height=1024,width=526)
-> Will create a preview from an image file into a jpeg file of size 1024 * 526
args :
file_path : the String of the path where is the file you want to get the preview
height : height of the preview in pixels
width : width of the preview in pixels. If not mentioned, width will be the same as height
returns :
str: path to the preview file
Other conversions :
The principle is the same as above
Zip to text or html : will build a list of files into texte/html inside the json
Office to jpeg : will build the pdf out of the office file and then build the jpeg.
Text to text : mainly just a copy stored in the cache
Command Line
For test purposes, you can use preview from the command line, giving the file to preview as a parameter:
preview demo.pdf
Or multiple files:
preview *.pdf
Cache mechanism
Naming :
The name of the preview generated in the cache directory will be :
- {file_name}-[{size}-]{file_md5sum}[({page})]{extension}
file_name = the name of the file you asked for a preview without the extension.
size = the size you asked for the preview. In case of a Jpeg preview.
file_md5sum = the md5sum of the entire path of the file. To avoid conflicts like files that have the same name but are in different directory.
page = the page asked in case of pdf or office document preview.
extensions = the extension of the preview (.jpeg for a jpeg, .txt for a text, etc)
Example :
These scripts :
GIF to JPEG :
import os
from preview_generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'
manager = PreviewManager(path=current_dir + 'cache')
path_to_preview = manager.get_jpeg_preview(
file_path=current_dir + 'the_gif.gif',
height=512,
width=512,
)
print('Preview created at path : ', path_to_preview)
will print
Preview created at path : the_gif-512x512-60dc9ef46936cc4fff2fe60bb07d4260.jpeg
ODT to JPEG :
import os
from preview_generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'
manager = PreviewManager(path=current_dir + 'cache')
path_to_file = manager.get_jpeg_preview(
file_path=current_dir + 'the_odt.odt',
page=1,
height=1024,
width=1024,
)
print('Preview created at path : ', path_to_preview)
will print
Preview created at path : the_odt-1024x1024-c8b37debbc45fa96466e5e1382f6bd2e-page1.jpeg
ZIP to Text :
import os
from preview_generator.manager import PreviewManager
current_dir = os.path.dirname(os.path.abspath(__file__)) +'/'
manager = PreviewManager(path=current_dir + 'cache')
path_to_file = manager.get_text_preview(
file_path=current_dir + 'the_zip.zip',
)
print('Preview created at path : ', path_to_file)
will print
Preview created at path : the_zip-a733739af8006558720be26c4dc5569a.txt
Adding new feature :
Before all, I’d be glad if you could share your new feature with everybody. So if you want to, you can fork it on github ( https://github.com/algoo/preview-generator) (see Developer’s Kit) and submit new features.
If you want to add a new preview builder to handle documents of type foo into jpeg (for example) here is how to proceed :
Warning If you need to look at other builders to find out how to proceed, avoid looking at any of the Office to something. It is a particular case and could misslead you.
Create a new class FooPreviewBuilder in a file foo_preview.py in preview_generator/preview
Make him inherit from the logical PreviewBuilder class
if it handles several pages it will be class FooPreviewBuilder(PreviewBuilder)
for single page it will be class FooPreviewBuilder(OnePagePreviewBuilder)
…
define your own build_jpeg_preview(…) (in the case we want to make foo into jpeg) based on the same principle as other build_{type}_preview(…)
Inside this build_jpeg_preview(…) you will call a method file_converter.foo_to_jpeg(…)
Define your foo_to_jpeg(…) method in preview_generator.preview.file_converter.py
inputs must be a stream of bytes and optional informations like a number of pages, a size, …
output must also be a stream of bytes
Maybe you’ll need to redefine some methods like get_page_number() or exists_preview() in your FooPreviewBuilder class
Developer’s Kit
Installation (dev) :
- From scratch on a terminal :
create your project directory (we will name it “the_project” but you can name it the way you want) : mkdir the_project
cd the_project
git clone https://github.com/algoo/preview-generator
- building your environment :
install python virtualenv builder : sudo apt install python3-venv
build your virtual env (env will be called “myenv”, you can name it the way you want): python3 -m venv myenv
if it’s not already, activate it : source myenv/bin/activate. (deactivate to deactivate)
install dependencies :
Exiftool - Follow instruction on the main website: https://sno.phy.queensu.ca/~phil/exiftool/
apt-get install zlib1g-dev
apt-get install libjpeg-dev
apt-get install python3-pythonmagick
apt-get install inkscape
apt-get install xvfb
apt-get install poppler-utils
apt-get install qpdf
apt-get install libfile-mimeinfo-perl
pip install wand
pip install Pillow
pip install PyPDF2
pip install python-magic
pip install pyexifinfo
pip install packaging
pip install xvfbwrapper
pip install pdf2image
pip install pathlib
if you use python 3.5 or less pip install typing
# general dependencies
apt-get install zlib1g-dev libjpeg-dev python3-pythonmagick inkscape xvfb poppler-utils qpdf libfile-mimeinfo-perl
pip install wand Pillow PyPDF2 python-magic pyexifinfo packaging xvfbwrapper pdf2image pathlib
# Exiftool
wget https://sno.phy.queensu.ca/~phil/exiftool/Image-ExifTool-11.11.tar.gz
gzip -dc Image-ExifTool-11.11.tar.gz | tar -xf -
cd Image-ExifTool-11.11
perl Makefile.PL
sudo make install
If you need to preview scribus .sla files you will need scribus >= 1.5. If it’s not available in your distribution you can use an AppImage.
Download the last AppImage from the official website https://www.scribus.net/downloads/unstable-branch/
mv /path/to/image/scribus-x.y.appimage /usr/local/bin/scribus
chmod +x /usr/local/bin/scribus
Code Convention :
When using subclass of generic abstract class, convention is to prefix it with name of the generic abstract class. For example:
ImagePreviewBuilderIMConvert(ImagePreviewBuilder)
Running Tests :
Pytest is a motor for unit testing
pip install -e .[testing]
go into the “tests” directory : cd path/to/you/project/directory/tests
run pytest
Others checks :
Run mypy checks:
mypy –ignore-missing-imports –disallow-untyped-defs .
Code formatting using black:
black -l 100 preview_generator setup.py build_supported_mimetypes_table_rst.py tests
Sorting of import:
isort tests/**/*.py preview_generator/**/*.py setup.py build_supported_mimetypes_table_rst.py
Flake8 check(unused import, variable and many other checks):
flake8 preview_generator setup.py build_supported_mimetypes_table_rst.py tests
Contribute :
install preview_generator with dev dependencies (contains tests dependencies)
pip install -e ‘.[dev]
install pre-commit hooks:
pre-commit install
Launch test :
pytest
You now can commit and see if pre-commit is ok with your change.
License
MIT licensed. https://opensource.org/licenses/MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for preview_generator_ivc-0.11.post3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6825e5aa8d6f167cbca1a7909c5d02127ed9af4a6e27b1e32dbd185c127548f5 |
|
MD5 | 5972201cfa59bb9505252855edb8e18e |
|
BLAKE2b-256 | d28fc4576f23459fcd255152c583e4d36dbba9ac28e413313d790af8b057f54f |
Hashes for preview_generator_ivc-0.11.post3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f6d59e5c369a97a221346a5846e5f19f7624e2d9d36c5a8a5004cd802469686 |
|
MD5 | f4a66485c53befdfab32f18e892321a0 |
|
BLAKE2b-256 | 7b9bd474beb1e3d50c73f700f7eccc78d4794ec8ea887e819ac2d1e59228280a |