PDF data parser
Project description
PDF Data extractor
Simple package wrapper that allows us to retrieve both the year of publication and a summary of a PDF.
The package mainly relies on 3 other packages :
- textract to convert PDF to plain text
- pdfminer3 to extract the date from a PDF file
- sumy to summarize text
Usage
from pdf_extractor import pdf_extractor
extractor = pdf_extractor.PDFExtractor()
pdf_path = "./test.pdf"
extractor.extract_data(pdf_path, 10)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2f3821173c57cf6d50dc9f35d373179dc0d8fdc9e05b0d0b979fa6cbf7813ad |
|
MD5 | ca14fe655c378de2f414997344b0b8ed |
|
BLAKE2b-256 | d80943050f9190cbdcfe5305982e387387bf8ec5a5bcb9ff374a8ef2099d48e6 |
Close
Hashes for extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9454b11bd3e991cacc2f760ed04a8f0b7757a039b3c79b96838b061e14916dd |
|
MD5 | 69b39144764b624151036bdf0b50a8fc |
|
BLAKE2b-256 | 3dcaee1b8c85aa96e8ac145ba4c4d23dda8e0e0bc783af47c8fd2082640b5e50 |