Skip to main content

OCR API: This OCR API is an application for extracting text from images and PDF files. It is built using Flask, a Python web framework. It utilizes the pytesseract OCR library, pymupdf and the PIL library for image processing.

Project description

OCR App with API

The OCR app is an application for extracting text from images and PDF files. It is built on Flask, a Python web framework, and utilizes the Tesseract OCR library and the PIL library for image processing.

Features

API for the upload of images and PDF files for text extraction. Support for various image formats such as JPG, JPEG, PNG and PDF. Processing of PDF files by converting them into images and extracting text from the images. API access to the same texts.

Requirements

To run the app, the dependencies from requirements.txt must be installed:

Flask pytesseract Tesseract OCR PIL (Python Imaging Library) fitz You can install the dependencies with pip by running the following command:

pip install -r requirements.txt

Starting the Application

Run the app with the following command:

python app.py

The app will be started in test mode on http://localhost:5000.

API Access Guide

For API usage, a request can be sent for example as Python code with the path of the image in the following form:

url = 'http://localhost:5000/api_endpoint' image_path = '/image_path' files = {'image': open(image_path, 'rb')} response = requests.post(url, files=files)

Note: Make sure the app is running.

Instructions

Make sure the app is running in your webbrowser. Since no content is put on the homepage you will see a server error. To use the API send a request like in the request.py file, supplying your path to the image .

Note

Make sure that Tesseract OCR is installed on your system and the 'TESSDATA_PREFIX' environment variable is correctly set to the directory with the Tesseract language data.

Rechtliches

Medizinische Daten werden mit MedCat klassifiziert. Die Erstellung erfolgt unter Verwendung der maschinenlesbaren Fassung des Bundesinstituts für Arzneimittel und Medizinprodukte (BfArM).

Max Hild // AG Lux // 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agl_ocr_reader-1.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

agl_ocr_reader-1.1.0-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file agl_ocr_reader-1.1.0.tar.gz.

File metadata

  • Download URL: agl_ocr_reader-1.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Darwin/22.5.0

File hashes

Hashes for agl_ocr_reader-1.1.0.tar.gz
Algorithm Hash digest
SHA256 880b87a4ddfae287da9c23a51861ffd27eeaf65eef8fd7f1b6792ab430f06571
MD5 296c8c5e2340b5457d65088e36c04534
BLAKE2b-256 1a93100c1020b4e7ea4218aba329448abfbfb4963045f6252117886a2723a88d

See more details on using hashes here.

File details

Details for the file agl_ocr_reader-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: agl_ocr_reader-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Darwin/22.5.0

File hashes

Hashes for agl_ocr_reader-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3aa74f9c6e2c85fa518cf36a29af752a6f390420eab4b11b56c333567fc3bbae
MD5 a939b159e7a223279623f3f2e107db1a
BLAKE2b-256 8db6ad8a851bc6415ea2b097ccfdfb4f6ba6ce9c03bdf5b99374fa0a4d236704

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page