Skip to main content

PDF data parser

Project description

PDF Data extractor

Simple package wrapper that allows us to retrieve both the year of publication and a summary of a PDF.

The package mainly relies on 3 other packages :

  • textract to convert PDF to plain text
  • pdfminer3 to extract the date from a PDF file
  • sumy to summarize text

Usage

from pdf_extractor import pdf_extractor
extractor = pdf_extractor.PDFExtractor()
pdf_path = "./test.pdf"
extractor.extract_data(pdf_path, 10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz.

File metadata

File hashes

Hashes for extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c2f3821173c57cf6d50dc9f35d373179dc0d8fdc9e05b0d0b979fa6cbf7813ad
MD5 ca14fe655c378de2f414997344b0b8ed
BLAKE2b-256 d80943050f9190cbdcfe5305982e387387bf8ec5a5bcb9ff374a8ef2099d48e6

See more details on using hashes here.

File details

Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b9454b11bd3e991cacc2f760ed04a8f0b7757a039b3c79b96838b061e14916dd
MD5 69b39144764b624151036bdf0b50a8fc
BLAKE2b-256 3dcaee1b8c85aa96e8ac145ba4c4d23dda8e0e0bc783af47c8fd2082640b5e50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page