Parsing PDF files with PDFium
Project description
redstork
PDF Parsing library, based on PDFium.
Requirements
- Python 3
Platfom support:
- Fairly recent Linux (Ubuntu 18.04 or better). Older systems not supported.
- MacOS 10.6 or better
- Windows support in works
Installation
pip install redstork
Features
- Convert to an image - page or arbitrary rectangle - using configurable scale
- Update document meta
- Update font encoding (for some PDF documents)
- Save document to a file
Quick start
Download a sample PDF file from here
from redstork import Document, PageObject, Glyph
doc = Document('sample.pdf')
print('Number of pages:', len(doc))
>> Number of pages: 15
print('MediaBox of the first page is:', doc[0].media_box)
>> MediaBox of the first page is: (0.0, 0.0, 612.0, 792.0)
print('Rotation of the first page is:', doc[0].rotation)
>> Rotation of the first page is: 0
print('Document title:', doc.meta['Title'])
>> Document title: Red Stork
print('First page has', len(doc[0]), 'objects')
>> First page has 4 objects
doc[0].render('page-0.ppm', scale=2) # render page #1 as image
page = doc[0]
for o in page:
if o.type == PageObject.OBJ_TYPE_TEXT:
for code, _, _ in o:
print(o.font[code], end='')
print()
>> RedStork
>> Release0.0.1
>> Apr02,2020
for fid, font in doc.fonts.items():
print(font.short_name, fid)
>> NimbusSanL-Bold (36, 0)
>> NimbusSanL-BoldItal (37, 0)
# lets generate an SVG file of the first letter on page 1
text_object = [o for o in page if o.type == PageObject.OBJ_TYPE_TEXT][0] # first text object
charcode, _, _ = text_object[0] # first character of the first text object
glyph = font.load_glyph(charcode)
path, delayed_c = [], []
for x, y, op, close in glyph:
x, y = round(x, 3), round(y, 3)
if op == Glyph.MOVETO:
path.append(f'M {x} {y}')
elif op == Glyph.LINETO:
path.append(f'L {x} {y}')
elif op == Glyph.CURVETO:
delayed_c.append(f'{x} {y}')
if len(delayed_c) == 3:
path.append('C ' + ', '.join(delayed_c))
delayed_c.clear()
if close:
path.append('Z')
path = ' '.join(path)
print('<svg><g fill="gray" transform="scale(100,-100)"><path d="' + path + '" /></g></svg>')
>> <svg><g fill="gray" transform="scale(100,-100)"><path d="M 0.291 0.289 L 0.463 0.289 C 0.52 0.289, ... L 0.318 0.414 Z" /></g></svg>
API docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for redstork-0.0.41-py3-none-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 913609b8ce86e30167fa19aa39f7b89d1e4688c59189ccf2e91b8753484d8fe9 |
|
MD5 | cb83845d9cad43bd8592f13e1c381e4c |
|
BLAKE2b-256 | 360ed4aa12f9b37b359b7866c826a7be3e38ec9dadc5057a59a5cba32bd2d743 |
Close
Hashes for redstork-0.0.41-py3-none-macosx_10_9_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4687f549f56cdcd07d999a8b3c6ac496f44184a55113fb12ca1fb303db3be8f0 |
|
MD5 | 8c354496ed9a2d7d6593b4d3d9b9ea7d |
|
BLAKE2b-256 | bdd932f8d01d9234894d8a7cad6455d824e6b5393884a80296f24b86eba8b1ef |