Extração OCR de processos judiciais — PDF para Markdown
Project description
tecjustica-ocr
Extração OCR de processos judiciais — PDF para Markdown.
Usa PaddleOCR 3.x (PP-OCRv5 / PP-StructureV3) com auto-detecção GPU/CPU.
Instalação
# Com GPU (CUDA 11.8)
pip install tecjustica-ocr[gpu] --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu118/
# Somente CPU
pip install tecjustica-ocr[cpu]
Uso
# Processar um PDF
tecjustica-ocr processo.pdf
# Processar pasta inteira
tecjustica-ocr pasta-processos/ -o resultado/
# Modo estrutural (tabelas, layout)
tecjustica-ocr processo.pdf --mode structure
# Modelo server (maior qualidade)
tecjustica-ocr processo.pdf -m server
Opções
| Opção | Default | Descrição |
|---|---|---|
-o, --output |
./output |
Diretório de saída |
-m, --model |
mobile |
mobile (rápido) ou server (qualidade) |
-d, --device |
auto |
auto, gpu ou cpu |
-s, --scale |
2 |
Escala de render: 1, 2 ou 3 |
-w, --workers |
auto |
Workers para render paralelo |
--mode |
text |
text (PP-OCRv5) ou structure (PP-StructureV3) |
--min-score |
0.5 |
Score mínimo |
-v, --verbose |
false |
Output detalhado |
API Python
from tecjustica_ocr import extract_text, extract_structure
texto = extract_text("processo.pdf")
markdown = extract_structure("processo.pdf")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tecjustica_ocr-0.1.0.tar.gz
(10.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tecjustica_ocr-0.1.0.tar.gz.
File metadata
- Download URL: tecjustica_ocr-0.1.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d49b6eb48ceb1b6c47445727c43e0b7a3f7c63be833da1d92732bc43a3dbbfbd
|
|
| MD5 |
e90008a4cb6e6921a444aa8237beb999
|
|
| BLAKE2b-256 |
a0e4ccc092dd8875c185babdb8bd1f4798c0a63287763952bba4f3c1ed956991
|
File details
Details for the file tecjustica_ocr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tecjustica_ocr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d4da4fb3101fa6099fc91c7fa086aea6dcabb2b406486125061c86842b99a0
|
|
| MD5 |
cd7dfb4ad0d65d6e25b38d228fad623f
|
|
| BLAKE2b-256 |
9794bad9bc53c05419bd9d4cc03a98063a9300e563b9f5c2106aa4265d0834e9
|