Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
Project description
start-ocr
- Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
- pdfplumber's
page.extract_text_lines()
is experimental and thus can work or not depending on the pdf file. - See documentation.
Installation
just start
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
start-ocr-0.0.6.tar.gz
(12.7 kB
view hashes)
Built Distribution
start_ocr-0.0.6-py3-none-any.whl
(14.6 kB
view hashes)
Close
Hashes for start_ocr-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ee579558ba194a9ff4b10641c52fae699fdaae02891a0a78dfa5e978d4e392f |
|
MD5 | 0655ff8861d105c2591d3a2e96c2df11 |
|
BLAKE2b-256 | 5ae906440bf79e4d660c7aab13b0e5980054d2a7bdb86d1de8dbcd659e71d409 |