OCR single or multiple files
Project description
KiwanOCR
This package takes a single PDF file or a list of PDF files and returns their content as a text file.
- Requirements
- Methods
Requirements
pip install or brew install
Make sure you have installed these dependencies:
brew install tesseractbrew install popplerpip pdf2images
import
Import the following:
from PIL import Imageimport pytesseract## python interface for tesseractimport os## navitage, create directoriesimport shutil## to delete the image folders with their imgsfrom pdf2image import convert_from_path## to turn pdf to imageimport glob## to glob files into a listfrom pathlib import Path## to specify path to your filesfrom natsort import natsorted, ns## natural sortingimport re## for regex
Methods
Setup
pip install kiwanocr.from kiwano import ocr
OCR a single PDF
ocr.ocr_file(file_name, output_file_name, language, resolution)
Arguments
- file_name: as a string
- output_file_name: as a string
- language: default is English (use
tesseract --list-langsto retrieve langague codes ) - resolution: default is 300 dpi (use integer value between 100 and 1200)
OCR a list of PDFs
ocr.ocr_files(list_name, output_file_name, language, resolution)
- list_name: The only difference is to enter a list name
Output
A .txt file is placed in a output folder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kiwanocr-0.0.8.tar.gz.
File metadata
- Download URL: kiwanocr-0.0.8.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6049a60278ae572874ee6d9f856ea1df7fcba4a2da5ea3d517ea19784194ba04
|
|
| MD5 |
491f1ae46556755b311a719f96001185
|
|
| BLAKE2b-256 |
ea91398502259fdad21ed2f213617190bf369598cc20be8b5b86501647df8564
|
File details
Details for the file kiwanocr-0.0.8-py3-none-any.whl.
File metadata
- Download URL: kiwanocr-0.0.8-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91bb93845ec98441c350c8a63415d99c0174687fff13fff700f369eb12a45c25
|
|
| MD5 |
64103f5094fa83354e49c4fd12a79795
|
|
| BLAKE2b-256 |
ed65343f1a5f57290e34cca88e15b486d8ef6ebc59f43728687b69e054df531c
|