Skip to main content

Extrair textos de documentos digitalizados e imagens.

Project description

ocrlib

Uso simples

from ocr_stream import File, TextRecognized, RecognizeImage, RecognizePdf

passo 1:

instânciar o objeto para reconhecer texto, utilizando o caminho do tesseract

tess: File = File('/usr/bin/tesseract') # Substitua pelo tesseract do seu sistema ocr = RecognizeImage.create(path_tesseract=tess)

passo 2:

instânciar um arquivo de imagem para extrair o texto.

image: File = File('path/to/file.png')

passo 3:

extrair o texto

text: str = ocr.image_to_string(image) print(text)

passo 4 opcional:

você pode salvar um arquivo PDF com o texto extraido

output_file: File = File('path/to/save.pdf') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_file_pdf(output_file)

passo 5 opcional:

você pode salvar uma planilha com o texto da imagem

output_excel: File = File('path/to/file.xlsx') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_excel(output_excel)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_stream-2.2.7.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_stream-2.2.7-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file ocr_stream-2.2.7.tar.gz.

File metadata

  • Download URL: ocr_stream-2.2.7.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.2.7.tar.gz
Algorithm Hash digest
SHA256 3ae01382c1c3d2b1ce020b999ae37e5b48224ec1f8758d7741f6501eac2406e8
MD5 84aeae4cada9583994f66159d129c0e6
BLAKE2b-256 22378189a8799800e8bfa2073247f3982dac350cba06c044c0f28d991865d88e

See more details on using hashes here.

File details

Details for the file ocr_stream-2.2.7-py3-none-any.whl.

File metadata

  • Download URL: ocr_stream-2.2.7-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 89450eeb67619488715dd4285ceaf3dd5426930d932c52519451420b8f5ca898
MD5 75a3a16f571cc7a3ae69885b71bf40f3
BLAKE2b-256 dc900c711eb1be927b563ef46b356edab4f1c8ed48fb2b404ecb4b04789f34d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page