A tool to extract text from PDF files.
A python package focused on extracting content out of PDF files.
There seems to be many options out there, but no single solution that is easy to install, even on Windows, and focus specifically on PDF files. So we have created this extractpdf package.
To use this package, install it from pypi using:
pip install extractpdf
Then use it like so:
import extractpdf as epdf # local file content = epdf.process('my_file.pdf') # url: content = epdf.process('http://www.example.com/some_file.pdf')
To control more features, one can use the PDFExtractor itself:
from extractpdf import PDFExtractor epdf = PDFExtractor() content = epdf.get_content('http://www.example.com/some_file.pdf', keep_download=True) f = epdf.filename # f = some_file.pdf epdf.delete_file()
We welcome contributers warmly!
For running this project locally, you need first to install the dependency packages. To install them, you can use pipenv:
Installation using pipenv (which combines virtualenv with pip)
# if you haven't installed pip sudo easy_install pip # install pipenv pip install pipenv
On MacOS - you can use homebrew:
brew install pipenv
Set the pipenv to be local in the project and then, install the packages and run the server
set PIPENV_VENV_IN_PROJECT=true # install all packages pipenv install
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for extractpdf-0.0.3-py3-none-any.whl