PDF data parser
Project description
PDF Data extractor
Simple package wrapper that allows us to retrieve both the year of publication and a summary of a PDF.
The package mainly relies on 3 other packages :
- textract to convert PDF to plain text
- pdfminer3 to extract the date from a PDF file
- sumy to summarize text
Usage
from pdf_extractor import pdf_extractor
extractor = pdf_extractor.PDFExtractor()
pdf_path = "./test.pdf"
extractor.extract_data(pdf_path, 10)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz.
File metadata
- Download URL: extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2f3821173c57cf6d50dc9f35d373179dc0d8fdc9e05b0d0b979fa6cbf7813ad
|
|
| MD5 |
ca14fe655c378de2f414997344b0b8ed
|
|
| BLAKE2b-256 |
d80943050f9190cbdcfe5305982e387387bf8ec5a5bcb9ff374a8ef2099d48e6
|
File details
Details for the file extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl.
File metadata
- Download URL: extracteur_de_fou_malade_pour_charles_le_charlo-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9454b11bd3e991cacc2f760ed04a8f0b7757a039b3c79b96838b061e14916dd
|
|
| MD5 |
69b39144764b624151036bdf0b50a8fc
|
|
| BLAKE2b-256 |
3dcaee1b8c85aa96e8ac145ba4c4d23dda8e0e0bc783af47c8fd2082640b5e50
|