OCR single or multiple files
Project description
KiwanOCR
This package takes a single PDF file or a list of PDF files and returns their content as a text file.
- Requirements
- Methods
Requirements
pip install
or brew install
Make sure you have installed these dependencies:
brew install tesseract
brew install poppler
pip pdf2images
import
Import the following:
from PIL import Image
import pytesseract
## python interface for tesseractimport os
## navitage, create directoriesimport shutil
## to delete the image folders with their imgsfrom pdf2image import convert_from_path
## to turn pdf to imageimport glob
## to glob files into a listfrom pathlib import Path
## to specify path to your filesfrom natsort import natsorted, ns
## natural sortingimport re
## for regex
Methods
Setup
pip install kiwanocr
.from kiwano import ocr
OCR a single PDF
ocr.ocr_file(file_name, output_file_name, language, resolution)
Arguments
- file_name: as a string
- output_file_name: as a string
- language: default is English (use
tesseract --list-langs
to retrieve langague codes ) - resolution: default is 300 dpi (use integer value between 100 and 1200)
OCR a list of PDFs
ocr.ocr_files(list_name, output_file_name, language, resolution)
- list_name: The only difference is to enter a list name
Output
A .txt
file is placed in a output
folder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kiwanocr-0.0.8.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file kiwanocr-0.0.8.tar.gz
.
File metadata
- Download URL: kiwanocr-0.0.8.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6049a60278ae572874ee6d9f856ea1df7fcba4a2da5ea3d517ea19784194ba04 |
|
MD5 | 491f1ae46556755b311a719f96001185 |
|
BLAKE2b-256 | ea91398502259fdad21ed2f213617190bf369598cc20be8b5b86501647df8564 |
File details
Details for the file kiwanocr-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: kiwanocr-0.0.8-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91bb93845ec98441c350c8a63415d99c0174687fff13fff700f369eb12a45c25 |
|
MD5 | 64103f5094fa83354e49c4fd12a79795 |
|
BLAKE2b-256 | ed65343f1a5f57290e34cca88e15b486d8ef6ebc59f43728687b69e054df531c |