Skip to main content

Extrair textos de documentos digitalizados e imagens.

Project description

ocrlib

Uso simples

from ocr_stream import File, TextRecognized, RecognizeImage, RecognizePdf

passo 1:

instânciar o objeto para reconhecer texto, utilizando o caminho do tesseract

tess: File = File('/usr/bin/tesseract') # Substitua pelo tesseract do seu sistema ocr = RecognizeImage.create(path_tesseract=tess)

passo 2:

instânciar um arquivo de imagem para extrair o texto.

image: File = File('path/to/file.png')

passo 3:

extrair o texto

text: str = ocr.image_to_string(image) print(text)

passo 4 opcional:

você pode salvar um arquivo PDF com o texto extraido

output_file: File = File('path/to/save.pdf') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_file_pdf(output_file)

passo 5 opcional:

você pode salvar uma planilha com o texto da imagem

output_excel: File = File('path/to/file.xlsx') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_excel(output_excel)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_stream-2.3.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_stream-2.3.2-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file ocr_stream-2.3.2.tar.gz.

File metadata

  • Download URL: ocr_stream-2.3.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.3.2.tar.gz
Algorithm Hash digest
SHA256 e96bbc8a24c934be7d6066aa5bb12a07deb26bc4bc874d5e93eaeb2918d5e02f
MD5 4db319bbb993e91ad385f62351e9b0d7
BLAKE2b-256 1a8c4e4ecbe992ad706c0ccfe9a5fbfd916e598e1fdace17ea20aa259824b639

See more details on using hashes here.

File details

Details for the file ocr_stream-2.3.2-py3-none-any.whl.

File metadata

  • Download URL: ocr_stream-2.3.2-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cbbbddae0dc957d4aa3fb6b55a703e0cf373af3ea5a005651144bbbb7d827560
MD5 0bb65d5683fe65c655998bad221bf24a
BLAKE2b-256 d2f3ef0535d6c3f62ee9fee9ef6eb5a93a8e689606c675b094e22773cdc96965

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page