Skip to main content

PDF data parser

Project description

PDF Data extractor

Simple package wrapper that allows us to retrieve both the year of publication and a summary of a PDF.

The package mainly relies on 3 other packages :

  • textract to convert PDF to plain text
  • pdfminer3 to extract the date from a PDF file
  • sumy to summarize text

Usage

from pdf_extractor import pdf_extractor
extractor = pdf_extractor.PDFExtractor()
pdf_path = "./test.pdf"
extractor.extract_data(pdf_path, 10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page