Skip to main content

Biblioteca para extração inteligente de documentos PDF com IA

Project description

DeepRead

Biblioteca Python para extracao inteligente de documentos PDF com IA

PyPI Python 3.9+ License: MIT CI Quality Seal


Caracteristicas

  • Autenticacao por Token - HMAC-SHA256 com timing-safe validation
  • Extracao Inteligente - Extrai informacoes de PDFs usando LLMs (OpenAI / Azure OpenAI)
  • OCR Automatico - Detecta e processa documentos escaneados (Azure AI Vision)
  • Structured Output - Respostas tipadas com Pydantic
  • Async + Sync - APIs sincrona e assincrona com batch processing
  • Resiliencia - Retry com backoff exponencial e circuit breaker
  • Cache - Cache LRU com TTL para evitar reprocessamento
  • Page Range - Filtre paginas especificas por posicao (inicio/fim)
  • Streaming - Modo lazy para economia de memoria
  • Tracking de Custos - Monitore tokens e custos por requisicao

Instalacao

pip install DeepRead.Monkai

Com OCR (Azure AI Vision):

pip install DeepRead.Monkai[ocr]

Desenvolvimento:

pip install DeepRead.Monkai[dev]

Uso Rapido

1. Obter Token de Acesso

O token de acesso e fornecido pela equipe Monkai. Para solicitar: contato@monkai.com.br

export DEEPREAD_API_TOKEN="dr_seu_token_fornecido_pela_monkai"
export OPENAI_API_KEY="sk-..."

2. Processar Documento

import os
from deepread import DeepRead, Question, QuestionConfig
from pydantic import BaseModel, Field

class ExtractionResponse(BaseModel):
    valor: str = Field(description="Valor extraido")
    unidade: str = Field(default="", description="Unidade de medida")
    confianca: float = Field(default=1.0, ge=0, le=1)

question = Question(
    config=QuestionConfig(id="quantidade", name="Extracao de Quantidade"),
    system_prompt="Voce e um especialista em extracao de dados de documentos.",
    user_prompt="Analise o texto e extraia a quantidade mencionada.\n\nTexto:\n{texto}",
    keywords=["quantidade", "litros", "volume", "total"],
    response_model=ExtractionResponse
)

dr = DeepRead(
    api_token=os.getenv("DEEPREAD_API_TOKEN"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-5.1",
    verbose=True
)

dr.add_question(question)
result = dr.process("documento.pdf")

print(f"Resposta: {result.get_answer('quantidade')}")
print(f"Tokens: {result.total_metrics.tokens}")
print(f"Custo: ${result.total_metrics.cost_usd:.4f}")

3. Multiplas Perguntas com Page Range

from deepread import PageRange

dr.add_questions([
    Question(
        config=QuestionConfig(id="preco", name="Preco"),
        user_prompt="Extraia o preco: {texto}",
        keywords=["preco", "valor", "R$"],
        page_range=PageRange(start=1, end=5, from_position="start")
    ),
    Question(
        config=QuestionConfig(id="conclusao", name="Conclusao"),
        user_prompt="Extraia a conclusao: {texto}",
        keywords=["conclusao", "resultado"],
        page_range=PageRange(start=1, end=3, from_position="end")
    ),
])

result = dr.process("documento.pdf")
for r in result.results:
    print(f"{r.question_name}: {r.answer}")

4. Classificacao de Documentos

from deepread import Classification
from typing import Literal

class ClassificacaoDoc(BaseModel):
    classificacao: Literal["APROVADO", "REPROVADO", "REVISAR"]
    justificativa: str
    confianca: float = Field(ge=0, le=1)

classification = Classification(
    system_prompt="Voce e um classificador de documentos.",
    user_prompt="Baseado nos dados extraidos, classifique o documento:\n\n{dados}",
    response_model=ClassificacaoDoc
)

dr.set_classification(classification)
result = dr.process("documento.pdf", classify=True)
print(f"Classificacao: {result.classification}")

5. Processamento em Lote

from pathlib import Path

docs = list(Path("documentos/").glob("*.pdf"))
results = dr.process_batch(docs, classify=True, max_workers=4)

for r in results:
    print(f"{r.document.filename}: {r.get_answer('preco')}")

6. API Assincrona

import asyncio

async def main():
    dr = DeepRead(
        api_token=os.getenv("DEEPREAD_API_TOKEN"),
        openai_api_key=os.getenv("OPENAI_API_KEY"),
    )
    dr.add_question(question)

    result = await dr.process_async("documento.pdf")
    print(result.get_answer("quantidade"))

    results = await dr.process_batch_async(docs, max_concurrency=5)

asyncio.run(main())

7. Cache e Resiliencia

dr = DeepRead(
    api_token=os.getenv("DEEPREAD_API_TOKEN"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    enable_cache=True,
    cache_ttl=3600,
    max_retries=3,
    circuit_breaker=True,
    circuit_breaker_threshold=5,
    circuit_breaker_timeout=60,
    streaming=True,
)

result = dr.process("documento.pdf")
print(f"Cache stats: {dr.cache_stats}")

8. Multiplos Tipos de Input

result = dr.process("documento.pdf")

result = dr.process("https://exemplo.com/doc.pdf")

with open("doc.pdf", "rb") as f:
    result = dr.process(f.read(), filename="doc.pdf")

import io
buffer = io.BytesIO(pdf_bytes)
result = dr.process(buffer, filename="doc.pdf")

Azure OpenAI

export OPENAI_PROVIDER=azure
export AZURE_API_KEY="sua-chave-azure"
export AZURE_API_ENDPOINT="https://seu-recurso.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"
export AZURE_DEPLOYMENT_NAME="gpt-4o"
dr = DeepRead(
    api_token=os.getenv("DEEPREAD_API_TOKEN"),
    provider="azure",
    azure_api_key="sua-chave-azure",
    azure_endpoint="https://seu-recurso.openai.azure.com",
    azure_deployment="gpt-4o",
)
Parametro OpenAI Azure OpenAI
provider "openai" (default) "azure"
openai_api_key Obrigatorio Nao usado
azure_api_key Nao usado Obrigatorio
azure_endpoint Nao usado Obrigatorio
azure_deployment Nao usado Obrigatorio
model Nome do modelo Ignorado (usa deployment)

Modelos Disponiveis

print(DeepRead.available_models())
# {
#     "fast": "gpt-4.1",
#     "balanced": "gpt-5.1",
#     "complete": "gpt-5-2025-08-07",
#     "economic": "gpt-5-mini-2025-08-07"
# }

API Reference

DeepRead

Metodo Descricao
add_question(question) Adiciona uma pergunta
add_questions(questions) Adiciona multiplas perguntas
remove_question(id) Remove uma pergunta
clear_questions() Remove todas as perguntas
set_classification(config) Configura classificacao
process(document) Processa um documento (sync)
process_async(document) Processa um documento (async)
process_batch(documents, max_workers) Processa lote (sync, com ThreadPool)
process_batch_async(documents, max_concurrency) Processa lote (async, com Semaphore)
clear_cache() Limpa o cache
cache_stats Retorna hits/misses/size do cache
available_models() Lista modelos disponiveis
create_question(...) Factory method para Question

DeepRead Constructor

Parametro Tipo Default Descricao
api_token str - Token de autenticacao (obrigatorio)
openai_api_key str env Chave API OpenAI
model str gpt-5.1 Modelo LLM
verbose bool False Logs detalhados
max_retries int 3 Retries para erros transientes
enable_cache bool False Habilita cache LRU
cache_ttl int 3600 TTL do cache em segundos
streaming bool False Modo lazy (economia de memoria)
circuit_breaker bool False Habilita circuit breaker
circuit_breaker_threshold int 5 Falhas para abrir circuito
circuit_breaker_timeout int 60 Segundos para recovery
max_file_size_mb float 50 Limite de tamanho do arquivo
max_pages int 500 Limite de paginas
provider str openai Provider: openai ou azure

Question

Campo Tipo Descricao
config QuestionConfig Configuracao basica (id, name)
system_prompt str Prompt de sistema
user_prompt str Template do prompt (use {texto})
keywords list[str] Keywords para filtrar paginas
page_range PageRange Range de paginas (opcional)
response_model BaseModel Modelo Pydantic (opcional)

PageRange

Campo Tipo Descricao
start int Pagina inicial (1-indexed)
end int Pagina final (None = ate o fim)
from_position str "start" ou "end"

ProcessingResult

Campo Tipo Descricao
document DocumentMetadata Metadados do documento
results list[Result] Resultados por pergunta
classification dict Classificacao (se aplicavel)
total_metrics ProcessingMetrics Metricas totais

ProcessingMetrics

Campo Tipo Descricao
time_seconds float Tempo de processamento
tokens int Total de tokens
prompt_tokens int Tokens do prompt
completion_tokens int Tokens da resposta
cost_usd float Custo em USD
model str Modelo utilizado

Estrutura do Projeto

deepread/
├── __init__.py          # Exports principais
├── reader.py            # Classe DeepRead (sync + async)
├── config.py            # Modelos, precos, configuracoes
├── utils.py             # PDF loading, filtragem, metadata
├── ocr.py               # Azure AI Vision OCR
├── cache.py             # Cache LRU com TTL
├── resilience.py        # Retry + Circuit Breaker
├── exceptions.py        # Excecoes customizadas
├── auth/
│   ├── __init__.py
│   ├── token.py         # HMAC-SHA256 token validation
│   └── exceptions.py    # Excecoes de autenticacao
└── models/
    ├── __init__.py
    ├── question.py      # Question, QuestionConfig, PageRange
    ├── result.py        # Result, ProcessingResult, Metrics
    ├── classification.py # Classification
    └── schemas.py       # Schemas de exemplo (DadosContrato, etc)

Documentacao

Documento Descricao
Instalacao Guia de instalacao e configuracao
Guia Rapido Comece em 5 minutos
Autenticacao Sistema de tokens
Perguntas Configuracao de perguntas e extracao
Classificacao Classificacao de documentos
OCR Reconhecimento optico de caracteres
Schemas Modelos de dados e estruturas
API Reference Referencia completa da API
Exemplos Exemplos praticos (01-07)
Certificacao Certificado de qualidade

Certificacao de Qualidade

Este projeto foi auditado e certificado pelo Claude AI Quality Seal.

Dimensao Score
Seguranca 8.7/10
Usabilidade 8.2/10
Escalabilidade 7.8/10
Qualidade de Codigo 8.0/10
Global 8.18/10

Classificacao: PROFISSIONAL Serial: DR-CQA-DE8364E7-116B6022-D40E375D-42BB4E3B

Ver certificado completo | Ver certificado HTML


Suporte


Desenvolvido por Monkai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepread_monkai-2.5.0-cp313-cp313-win_amd64.whl (502.9 kB view details)

Uploaded CPython 3.13Windows x86-64

deepread_monkai-2.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

deepread_monkai-2.5.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

deepread_monkai-2.5.0-cp313-cp313-macosx_10_13_x86_64.whl (582.9 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

deepread_monkai-2.5.0-cp312-cp312-win_amd64.whl (505.2 kB view details)

Uploaded CPython 3.12Windows x86-64

deepread_monkai-2.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

deepread_monkai-2.5.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

deepread_monkai-2.5.0-cp312-cp312-macosx_10_13_x86_64.whl (586.8 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

deepread_monkai-2.5.0-cp311-cp311-win_amd64.whl (523.7 kB view details)

Uploaded CPython 3.11Windows x86-64

deepread_monkai-2.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepread_monkai-2.5.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

deepread_monkai-2.5.0-cp311-cp311-macosx_10_9_x86_64.whl (598.5 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

deepread_monkai-2.5.0-cp310-cp310-win_amd64.whl (521.3 kB view details)

Uploaded CPython 3.10Windows x86-64

deepread_monkai-2.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepread_monkai-2.5.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (3.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

deepread_monkai-2.5.0-cp310-cp310-macosx_10_9_x86_64.whl (605.4 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file deepread_monkai-2.5.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6c0ebfa9fa67201036d200a903c4fe3931b608c70cb18ec5c04891ada417311a
MD5 7d840d4b57ac4d3b23f20e3cae2f05d6
BLAKE2b-256 1482700498f063020b8d5f134dd46d5dfe062a79d1d6146e6a2715f7bf75cc4f

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 68532bd65fb18ab35e1819050cdeb9f0e9d24ed9d97fcb9b741bea8cbf8cf180
MD5 1ca5d7c2e5e6f03a8ba70facc4d889b3
BLAKE2b-256 26dd89dcc89bc3afd453c1566166449a1fae61bbeceba4a0a15444ad55c10408

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 5f52abdc22cc4845eb049c375e09c8ac7c6411ccfd39b7cfe1d05d08383b75dd
MD5 1c08f468257a8e02501c361b564b8a64
BLAKE2b-256 62540b8900dc44fa1c22c14513689cededd6f482a098f8989b39f69368022cf8

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 e0844ae2543ebf39e867fa105b3ca714d554328b17017f21c29e57a4d126e7aa
MD5 02d6f1dac7b4dad5fea6dc294f062871
BLAKE2b-256 1e43971b1a9191d7e4140e34241e768fd340580787c5ec139ae90598d0d15d33

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e9a832a1abe3080b6be9244d75d0b3c35362cbede3ca697921ab591521ff9795
MD5 f31824712e303d8d667f583b14ba16d9
BLAKE2b-256 7c16c1903e40ccdf1d2e336773471704274128d669574d151e010869d15da1b9

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 91a23bf3135c36f4c5f1eed7d5b2bd6eb97ade37450478204a3ee152ca33497b
MD5 2a104d7f5e58f66673013e316c92262f
BLAKE2b-256 8edbc20e445afa2ddbdbc708afbe0039fbfb96ffcbd968364af1065fde6a60f1

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 502ce5b6ec5a412e94d922a48f743e36ea3cae3fa792e8284f9b2f23a5392e74
MD5 7566612b0ebe65b76881519f8247c6b8
BLAKE2b-256 800796d080255cd4a13e378945f7bbcb7aaecf33effb8ab683c26b8009d9c493

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 25210dfffc96ec482aaeace1be277d35b443f533f2815e2e6b481fc37583c686
MD5 c8018551bbffa6c9cc10b72fafa38e1c
BLAKE2b-256 5d0c0e12fae6e2080207ce1a76700108e40ec82d3f5301d15a33505584af04e4

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 27091fe1ec75b1a46bd08736862aa9864a6cf3df5dc034bd323ee0938e1ec387
MD5 e1871c1a3f47390bc4b4940473228d98
BLAKE2b-256 d1b61d199543317125b52d0d14253ac27023f333f899857a4446223c431aabeb

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a6ad339216f9ba7bad4c8b035a80919ee5cabc3f9d67039a925ec3991848a732
MD5 500dc202a1f7185f1e4bc2be4807b82d
BLAKE2b-256 66fa9e18b8ea52104b3d46bd5abbb033b3d6c823bf5dc4a94f97614a4b18617e

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 676bd1cccebde722ac1a12c1cec52c68c1d6e8096267b90d3bc22715f8c9696c
MD5 12a39ae93bd762f8d254f43e2c2f47dd
BLAKE2b-256 49c293161f1ff80f0fd74d989f38c47992e264e09beaae4165ed760e0128693a

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bc5b1e389addbb7499b870457baf7aadc854d48a82fae90cda25bb70ff4b9a47
MD5 56d9011746bb713c853692264987ed96
BLAKE2b-256 b9f9720fc611adf71d8cf84d9da590f2818b6eb74a715332e7f1c7df9c595079

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 527051af585a7b97f664d063e805d4fcf78b445ae524a016e810f07b0b20c4ac
MD5 7e314397e4eabb0e1351d2afe62f7c1d
BLAKE2b-256 0a7bcce6234dfa89e916c9bdd3d446abba553f0e047a390841833738eee2b00a

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 bdd7becc119f8d548a70c9c32cf0747145c3c831a7d496da42b9d085942f8251
MD5 3f4ccf8c422327f75bd71ca7e5ba0067
BLAKE2b-256 77247e1156dc8a51c291edd1888a73f64f62eeb5f0e01afe2c797a1a769438f9

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 ebd64e24e00220e318a0703f1d9fbc6d8886d1aec8301d29a0743ae52dd7cdbd
MD5 d25c3c889f1cc66f95f316385c958094
BLAKE2b-256 a70000824e20cb188bcce4ef442858cb26436e33ae599ca97589feaea9546d0b

See more details on using hashes here.

File details

Details for the file deepread_monkai-2.5.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for deepread_monkai-2.5.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 51dac7b431a6af1a4fe0ef8751f63c0f713b29f87234ffa477eb030673c53417
MD5 857aa83e68cef2ad717691105b93fa2a
BLAKE2b-256 015199deb44c458b9dc2f1b18dfb7bc16d2f481658f2a4fb80266d18fef8e5cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page