Skip to main content

Hybrid OCR with gemini and DocumentAI

Project description

Gemini OCR

gemini-ocr

Traceable Generative Markdown for PDFs

gemini-ocr provides anchorite provider plugins that convert PDFs to traceable Markdown using Google Cloud APIs.

  • GeminiMarkdownProvider — generates Markdown via the Gemini API
  • DocAIMarkdownProvider — generates Markdown via Document AI Layout
  • DocAIAnchorProvider — extracts bounding boxes via Document AI OCR
  • DoclingMarkdownProvider — generates Markdown via Docling (stub)

Quick Start

import asyncio
from pathlib import Path

import anchorite
from gemini_ocr import DocAIAnchorProvider, GeminiMarkdownProvider

async def main():
    markdown_provider = GeminiMarkdownProvider(
        project_id="my-gcp-project",
        location="us-central1",
        model_name="gemini-2.5-flash",
    )
    anchor_provider = DocAIAnchorProvider(
        project_id="my-gcp-project",
        location="us-central1",
        processor_id="projects/.../processors/...",
    )

    chunks = anchorite.document.chunks(Path("document.pdf"))
    result = await anchorite.process_document(
        chunks, markdown_provider, anchor_provider, renumber=True
    )

    print(result.markdown_content)
    print(result.annotate())   # Markdown with inline <span data-bbox="..."> tags

asyncio.run(main())

Configuration via Environment Variables

from_env() builds providers from environment variables, useful for twelve-factor deployments:

import anchorite
from gemini_ocr import from_env

markdown_provider, anchor_provider = from_env()
chunks = anchorite.document.chunks(Path("document.pdf"))
result = await anchorite.process_document(chunks, markdown_provider, anchor_provider)
Variable Description
GEMINI_OCR_PROJECT_ID GCP project ID (required)
GEMINI_OCR_LOCATION GCP location (default: us-central1)
GEMINI_OCR_MODE gemini (default), documentai, or docling
GEMINI_OCR_GEMINI_MODEL_NAME Gemini model name (required in gemini mode)
GEMINI_OCR_LAYOUT_PROCESSOR_ID Document AI processor ID (required in documentai mode)
GEMINI_OCR_OCR_PROCESSOR_ID Document AI OCR processor ID (enables bounding box extraction)
GEMINI_OCR_DOCUMENTAI_LOCATION Document AI endpoint location override
GEMINI_OCR_QUOTA_PROJECT_ID Quota project override for Gemini API calls
GEMINI_OCR_GEMINI_PROMPT Additional prompt appended to the default Gemini prompt
GEMINI_OCR_CACHE_DIR Directory for caching API responses
GEMINI_OCR_INCLUDE_BBOXES Set to false to skip bounding box extraction (default: true)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_ocr-0.5.0.tar.gz (227.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_ocr-0.5.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file gemini_ocr-0.5.0.tar.gz.

File metadata

  • Download URL: gemini_ocr-0.5.0.tar.gz
  • Upload date:
  • Size: 227.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gemini_ocr-0.5.0.tar.gz
Algorithm Hash digest
SHA256 ba6e168c75d07b88c15ac65512d7371d5e74d32944914974451975e9a4610dca
MD5 94cb6129f96417e3110269de9cb843bb
BLAKE2b-256 7db6ddbc47983e046ed3a0c699a7317c249689b4c38b55ff83b66aac8ef31deb

See more details on using hashes here.

Provenance

The following attestation bundles were made for gemini_ocr-0.5.0.tar.gz:

Publisher: release.yaml on folded/gemini-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gemini_ocr-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: gemini_ocr-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gemini_ocr-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5faf5189d2e3faf4374653ddc90e8de8823dbb20b9bbd227a8a4deb3887036d5
MD5 0f10a80bb0de6be4d56ca723d63717b0
BLAKE2b-256 c47178205457b7001b98be71431ed03108f23d01ac2b874b5d50f600c94b6eda

See more details on using hashes here.

Provenance

The following attestation bundles were made for gemini_ocr-0.5.0-py3-none-any.whl:

Publisher: release.yaml on folded/gemini-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page