Skip to main content

Extração OCR de processos judiciais — PDF para Markdown

Project description

tecjustica-ocr

Extração OCR de processos judiciais — PDF para Markdown.

Usa PaddleOCR 3.x (PP-OCRv5 / PP-StructureV3) com auto-detecção GPU/CPU.

Instalação

# Com GPU (CUDA 11.8)
pip install tecjustica-ocr[gpu] --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu118/

# Somente CPU
pip install tecjustica-ocr[cpu]

Uso

# Processar um PDF
tecjustica-ocr processo.pdf

# Processar pasta inteira
tecjustica-ocr pasta-processos/ -o resultado/

# Modo estrutural (tabelas, layout)
tecjustica-ocr processo.pdf --mode structure

# Modelo server (maior qualidade)
tecjustica-ocr processo.pdf -m server

Opções

Opção Default Descrição
-o, --output ./output Diretório de saída
-m, --model mobile mobile (rápido) ou server (qualidade)
-d, --device auto auto, gpu ou cpu
-s, --scale 2 Escala de render: 1, 2 ou 3
-w, --workers auto Workers para render paralelo
--mode text text (PP-OCRv5) ou structure (PP-StructureV3)
--min-score 0.5 Score mínimo
-v, --verbose false Output detalhado

API Python

from tecjustica_ocr import extract_text, extract_structure

texto = extract_text("processo.pdf")
markdown = extract_structure("processo.pdf")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tecjustica_ocr-0.1.0.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tecjustica_ocr-0.1.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file tecjustica_ocr-0.1.0.tar.gz.

File metadata

  • Download URL: tecjustica_ocr-0.1.0.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tecjustica_ocr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d49b6eb48ceb1b6c47445727c43e0b7a3f7c63be833da1d92732bc43a3dbbfbd
MD5 e90008a4cb6e6921a444aa8237beb999
BLAKE2b-256 a0e4ccc092dd8875c185babdb8bd1f4798c0a63287763952bba4f3c1ed956991

See more details on using hashes here.

File details

Details for the file tecjustica_ocr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tecjustica_ocr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tecjustica_ocr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11d4da4fb3101fa6099fc91c7fa086aea6dcabb2b406486125061c86842b99a0
MD5 cd7dfb4ad0d65d6e25b38d228fad623f
BLAKE2b-256 9794bad9bc53c05419bd9d4cc03a98063a9300e563b9f5c2106aa4265d0834e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page