PDF parser and analyzer
Project description
yapdfminer (Yet Another PDFMiner fork)
PDFMiner is a great Python tool that had apparently been abandoned by its original author Yusuke Shinyama in 2016. Ever since, it got forked and re-forked time and time again but never maintained for long.
Goals
I created this fork in order to better service the requirements of my own project from PDF analysis:
- Apply multiple pull requests that languish on the original repository, that solve some bugs that I ran into.
- Target Python 3.7. There will be no attempt to maintain backwards compatibility to older versions of Python.
- Generate a smaller distribution package (I'm running on AWS Lambda, where RAM is at a premium), at the cost of dropping support for Chinese, Japanese and Korean PDFs.
If you require Asian language support, it should be simple enough to re-enable it by building with the resource
files in cmaprsrc
.
Other than the issues mentioned above, I do strive to make this library drop-in compatible with the original PDFMiner, including for example the package name (which pdfminer3 had changed).
Lineage:
- This is a fork of gwk/pdfminer3.
- gwk/pdfminer3 was forked from pdfminer/pdfminer.six
- pdfminer.six was forked from the original pdfminer
About
PDFMiner is a pure Python tool for extracting information from PDF documents.
Its focus is on PDF content retrieval and analysis.
Please refer to the original repo for more information: https://github.com/euske
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for yapdfminer-1.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2398504813aee4a5e3c645741726038912e4cb0f74a0a552a6ac4717d174076a |
|
MD5 | cac2022a066afba55f91423fa9489e11 |
|
BLAKE2b-256 | 08c4bb77287fb10f4485d4c0c426fc9b63e3c26c4e6b6cde3f97a9d9022526c3 |