Parsing PDF files with PDFium
Project description
redstork
PDF Parsing library, based on PDFium.
Requirements
- Python 3
Platfom support:
- Fairly recent Linux (Ubuntu 18.04 or better). Older systems not supported.
- MacOS 10.6 or better
- Windows support in works
Installation
pip install redstork
Features
- Convert to an image - page or arbitrary rectangle - using configurable scale
- Update document meta
- Update font encoding (for some PDF documents)
- Save document to a file
Quick start
Download a sample PDF file from here
from redstork import Document, PageObject, Glyph
doc = Document('sample.pdf')
print('Number of pages:', len(doc))
>> Number of pages: 15
print('MediaBox of the first page is:', doc[0].media_box)
>> MediaBox of the first page is: (0.0, 0.0, 612.0, 792.0)
print('Rotation of the first page is:', doc[0].rotation)
>> Rotation of the first page is: 0
print('Document title:', doc.meta['Title'])
>> Document title: Red Stork
print('First page has', len(doc[0]), 'objects')
>> First page has 4 objects
doc[0].render('page-0.ppm', scale=2) # render page #1 as image
page = doc[0]
for o in page:
if o.type == PageObject.OBJ_TYPE_TEXT:
for code, _, _ in o:
print(o.font[code], end='')
print()
>> RedStork
>> Release0.0.1
>> Apr02,2020
for fid, font in doc.fonts.items():
print(font.short_name, fid)
>> NimbusSanL-Bold (36, 0)
>> NimbusSanL-BoldItal (37, 0)
# lets generate an SVG file of the first letter on page 1
text_object = [o for o in page if o.type == PageObject.OBJ_TYPE_TEXT][0] # first text object
charcode, _, _ = text_object[0] # first character of the first text object
glyph = font.load_glyph(charcode)
path, delayed_c = [], []
for x, y, op, close in glyph:
x, y = round(x, 3), round(y, 3)
if op == Glyph.MOVETO:
path.append(f'M {x} {y}')
elif op == Glyph.LINETO:
path.append(f'L {x} {y}')
elif op == Glyph.CURVETO:
delayed_c.append(f'{x} {y}')
if len(delayed_c) == 3:
path.append('C ' + ', '.join(delayed_c))
delayed_c.clear()
if close:
path.append('Z')
path = ' '.join(path)
print('<svg><g fill="gray" transform="scale(100,-100)"><path d="' + path + '" /></g></svg>')
>> <svg><g fill="gray" transform="scale(100,-100)"><path d="M 0.291 0.289 L 0.463 0.289 C 0.52 0.289, ... L 0.318 0.414 Z" /></g></svg>
API docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redstork-0.0.41-py3-none-manylinux1_x86_64.whl.
File metadata
- Download URL: redstork-0.0.41-py3-none-manylinux1_x86_64.whl
- Upload date:
- Size: 6.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/2.7.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
913609b8ce86e30167fa19aa39f7b89d1e4688c59189ccf2e91b8753484d8fe9
|
|
| MD5 |
cb83845d9cad43bd8592f13e1c381e4c
|
|
| BLAKE2b-256 |
360ed4aa12f9b37b359b7866c826a7be3e38ec9dadc5057a59a5cba32bd2d743
|
File details
Details for the file redstork-0.0.41-py3-none-macosx_10_9_intel.whl.
File metadata
- Download URL: redstork-0.0.41-py3-none-macosx_10_9_intel.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3, macOS 10.9+ Intel (x86-64, i386)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/2.7.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4687f549f56cdcd07d999a8b3c6ac496f44184a55113fb12ca1fb303db3be8f0
|
|
| MD5 |
8c354496ed9a2d7d6593b4d3d9b9ea7d
|
|
| BLAKE2b-256 |
bdd932f8d01d9234894d8a7cad6455d824e6b5393884a80296f24b86eba8b1ef
|