Skip to main content

OCR API: This OCR API is an application for extracting text from images and PDF files. It is built using Flask, a Python web framework. It utilizes the pytesseract OCR library, pymupdf and the PIL library for image processing.

Project description

OCR App with API

The OCR app is an application for extracting text from images and PDF files. It is built on Flask, a Python web framework, and utilizes the Tesseract OCR library and the PIL library for image processing.

Features

API for the upload of images and PDF files for text extraction. Support for various image formats such as JPG, JPEG, PNG and PDF. Processing of PDF files by converting them into images and extracting text from the images. API access to the same texts.

Requirements

To run the app, the dependencies from requirements.txt must be installed:

Flask pytesseract Tesseract OCR PIL (Python Imaging Library) fitz You can install the dependencies with pip by running the following command:

pip install -r requirements.txt

Starting the Application

Run the app with the following command:

python app.py

The app will be started in test mode on http://localhost:5000.

API Access Guide

For API usage, a request can be sent for example as Python code with the path of the image in the following form:

url = 'http://localhost:5000/api_endpoint' image_path = '/image_path' files = {'image': open(image_path, 'rb')} response = requests.post(url, files=files)

Note: Make sure the app is running.

Instructions

Make sure the app is running in your webbrowser. Since no content is put on the homepage you will see a server error. To use the API send a request like in the request.py file, supplying your path to the image .

Note

Make sure that Tesseract OCR is installed on your system and the 'TESSDATA_PREFIX' environment variable is correctly set to the directory with the Tesseract language data.

Rechtliches

Medizinische Daten werden mit MedCat klassifiziert. Die Erstellung erfolgt unter Verwendung der maschinenlesbaren Fassung des Bundesinstituts für Arzneimittel und Medizinprodukte (BfArM).

Max Hild // AG Lux // 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agl_ocr_reader-1.1.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

agl_ocr_reader-1.1.1-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file agl_ocr_reader-1.1.1.tar.gz.

File metadata

  • Download URL: agl_ocr_reader-1.1.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Darwin/22.5.0

File hashes

Hashes for agl_ocr_reader-1.1.1.tar.gz
Algorithm Hash digest
SHA256 b75d978d9af63cc6f7f1c1ed09ae05bcd04012b85b26bf02a2b8320a53641d95
MD5 1a5bc113cb670e964aa4829afc46b25e
BLAKE2b-256 00d79c2978244aa63a766f9f93714f6c2b295509a99bb219b319987f79e256a5

See more details on using hashes here.

File details

Details for the file agl_ocr_reader-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: agl_ocr_reader-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Darwin/22.5.0

File hashes

Hashes for agl_ocr_reader-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72354bb5e3d2ffbc7548ba391e0efc9a9693f6ef6402d2a3a5154263c9013edb
MD5 b6b5e03fb9e786f9d18f0ccd05b2547d
BLAKE2b-256 cd59e05ce0b86c4606607f89383f77383844b815887f1bc57d4071d4951456de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page