PDF data parser
Project description
PDF Data extractor
Simple package wrapper that allows us to retrieve both the year of publication and a summary of a PDF.
The package mainly relies on 3 other packages :
- textract to convert PDF to plain text
- pdfminer3 to extract the date from a PDF file
- sumy to summarize text
Usage
from pdf_extractor import pdf_extractor
extractor = pdf_extractor.PDFExtractor()
pdf_path = "./test.pdf"
extractor.extract_data(pdf_path, 10)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz
.
File metadata
- Download URL: extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2f3821173c57cf6d50dc9f35d373179dc0d8fdc9e05b0d0b979fa6cbf7813ad |
|
MD5 | ca14fe655c378de2f414997344b0b8ed |
|
BLAKE2b-256 | d80943050f9190cbdcfe5305982e387387bf8ec5a5bcb9ff374a8ef2099d48e6 |
File details
Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9454b11bd3e991cacc2f760ed04a8f0b7757a039b3c79b96838b061e14916dd |
|
MD5 | 69b39144764b624151036bdf0b50a8fc |
|
BLAKE2b-256 | 3dcaee1b8c85aa96e8ac145ba4c4d23dda8e0e0bc783af47c8fd2082640b5e50 |