Skip to main content

Extrair textos de documentos digitalizados e imagens.

Project description

ocrlib

Uso simples

from ocr_stream import File, TextRecognized, RecognizeImage, RecognizePdf

passo 1:

instânciar o objeto para reconhecer texto, utilizando o caminho do tesseract

tess: File = File('/usr/bin/tesseract') # Substitua pelo tesseract do seu sistema ocr = RecognizeImage.create(path_tesseract=tess)

passo 2:

instânciar um arquivo de imagem para extrair o texto.

image: File = File('path/to/file.png')

passo 3:

extrair o texto

text: str = ocr.image_to_string(image) print(text)

passo 4 opcional:

você pode salvar um arquivo PDF com o texto extraido

output_file: File = File('path/to/save.pdf') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_file_pdf(output_file)

passo 5 opcional:

você pode salvar uma planilha com o texto da imagem

output_excel: File = File('path/to/file.xlsx') recognized: TextRecognized = ocr.image_recognize(image) recognized.to_document().to_excel(output_excel)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_stream-2.5.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_stream-2.5.2-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file ocr_stream-2.5.2.tar.gz.

File metadata

  • Download URL: ocr_stream-2.5.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.5.2.tar.gz
Algorithm Hash digest
SHA256 17c9ec70ce22e1ba85459be58e086d0ca1e512ee1b703a3f382243d08f6ef562
MD5 c3820207232d51048b300473ed94f9d9
BLAKE2b-256 c564e8d99215f5cc1fb5345acdc90c414af078f5da6882f88212f466f4022196

See more details on using hashes here.

File details

Details for the file ocr_stream-2.5.2-py3-none-any.whl.

File metadata

  • Download URL: ocr_stream-2.5.2-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for ocr_stream-2.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8915365ca103a5669bfbc060a432410984b4b678d525589047174fc4b4200534
MD5 a320ad42821a5c0f335105b3a643f6ee
BLAKE2b-256 786a1c0e822995a88af1b610282fac9c535b1a7d120211c706b72154dcf0b773

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page