Skip to main content

A package for extracting structured data from receipts.

Project description

Receipt OCR Engine

Build Status Code Coverage License

An efficient OCR engine for receipt image processing.

This repository provides a comprehensive solution for Optical Character Recognition (OCR) on receipt images, featuring both a dedicated Tesseract OCR module and a general receipt processing package using LLMs.

image

Star History

Star History Chart

Project Structure

The project is organized into two main modules:

  • src/receipt_ocr/: A new package for abstracting general receipt processing logic, including CLI, programmatic API, and a production FastAPI web service for LLM-powered structured data extraction from receipts.
  • src/tesseract_ocr/: Contains the Tesseract OCR FastAPI application, CLI, utility functions, and Docker setup for performing raw OCR text extraction from images.

Prerequisites

  • Python 3.x
  • Docker & Docker-compose(for running as a service)
  • Tesseract OCR (for local Tesseract CLI usage) - Installation Guide

Usage Examples

Receipt OCR Module

This module provides a higher-level abstraction for processing receipts, leveraging LLMs for parsing and extraction.

To use the receipt-ocr CLI, first install it:

pip install receipt-ocr
  1. Configure Environment Variables: Create a .env file in the project root or set environment variables directly. This module supports multiple LLM providers.

    Example .env for OpenAI:

    Get it from here: http://platform.openai.com/api-keys

    OPENAI_API_KEY="your_openai_api_key_here"
    OPENAI_MODEL="gpt-4.1"
    

    Example .env for Gemini:

    Get it from here: https://aistudio.google.com/app/apikey

    OPENAI_API_KEY="your_gemini_api_key_here"
    OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
    OPENAI_MODEL="gemini-2.5-pro"
    

    Example .env for Groq:

    Get it from here: https://console.groq.com/keys

    OPENAI_API_KEY="your_groq_api_key_here"
    OPENAI_BASE_URL="https://api.groq.com/openai/v1/models"
    OPENAI_MODEL="llama3-8b-8192"
    
  2. Process a receipt using the receipt-ocr CLI:

    receipt-ocr images/receipt.jpg
    

    This command will use the configured LLM provider to extract structured data from the receipt image.

    sample output

    {
      "merchant_name": "Saathimart.com",
      "merchant_address": "Narephat, Kathmandu",
      "transaction_date": "2024-05-07",
      "transaction_time": "09:09:00",
      "total_amount": 185.0,
      "line_items": [
        {
          "item_name": "COLGATE DENTAL",
          "item_quantity": 1,
          "item_price": 95.0,
          "item_total": 95.0
        },
        {
          "item_name": "PATANJALI ANTI",
          "item_quantity": 1,
          "item_price": 70.0,
          "item_total": 70.0
        },
        {
          "item_name": "GODREJ NO 1 SOAP",
          "item_quantity": 1,
          "item_price": 20.0,
          "item_total": 20.0
        }
      ]
    }
    
  3. Using Receipt OCR Programmatically in Python:

    You can also use the receipt-ocr library directly in your Python code:

    from receipt_ocr.processors import ReceiptProcessor
    from receipt_ocr.providers import OpenAIProvider
    
    # Initialize the provider
    provider = OpenAIProvider(api_key="your_api_key", base_url="your_base_url")
    
    # Initialize the processor
    processor = ReceiptProcessor(provider)
    
    # Define the JSON schema for extraction
    json_schema = {
        "merchant_name": "string",
        "merchant_address": "string",
        "transaction_date": "string",
        "transaction_time": "string",
        "total_amount": "number",
        "line_items": [
            {
                "item_name": "string",
                "item_quantity": "number",
                "item_price": "number",
            }
        ],
    }
    
    # Process the receipt
    result = processor.process_receipt("path/to/receipt.jpg", json_schema, "gpt-4.1")
    
    print(result)
    

    Advanced Usage with Response Format Types:

    For compatibility with different LLM providers, you can specify the response format type:

    result = processor.process_receipt(
        "path/to/receipt.jpg", 
        json_schema, 
        "gpt-4.1", 
        response_format_type="json_object"  # or "json_schema", "text"
    )
    

    Supported response_format_type values:

    • "json_object" (default) - Standard JSON object format
    • "json_schema" - Structured JSON schema format (for newer OpenAI APIs)
    • "text" - Plain text responses

    This will output the same structured JSON as the CLI.

  4. Run Receipt OCR as a Docker web service:

    For a production-ready REST API, use the FastAPI web service:

    docker compose -f app/docker-compose.yml up
    

    The service provides REST endpoints for receipt processing:

    • GET /health - Health check
    • POST /ocr/ - Process receipt images with optional custom JSON schemas

    Example API usage:

    # Health check
    curl http://localhost:8000/health
    
    # Process receipt with default schema
    curl -X POST "http://localhost:8000/ocr/" \
      -F "file=@images/receipt.jpg"
    
    # Process with custom schema
    curl -X POST "http://localhost:8000/ocr/" \
      -F "file=@images/receipt.jpg" \
      -F 'json_schema={"merchant": "string", "total": "number"}'
    

    For detailed API documentation, visit http://localhost:8000/docs when the service is running.

Tesseract OCR Module

This module provides direct OCR capabilities using Tesseract. For more detailed local setup and usage, refer to src/tesseract_ocr/README.md.

  1. Run Tesseract OCR locally via CLI:

    python src/tesseract_ocr/main.py -i images/receipt.jpg
    

    Replace images/receipt.jpg with the path to your receipt image.

    Please ensure that the image is well-lit and that the edges of the receipt are clearly visible and detectable within the image. Receipt Image

  2. Run Tesseract OCR as a Docker service:

    docker compose -f src/tesseract_ocr/docker-compose.yml up
    

    Once the service is up and running, you can perform OCR on receipt images by sending a POST request to http://localhost:8000/ocr/ with the image file.

    API Endpoint:

    • POST /ocr/: Upload a receipt image file to perform OCR. The response will contain the extracted text from the receipt.

    Note: The Tesseract OCR API returns raw extracted text from the receipt image. For structured JSON output with parsed fields such as merchant name, line items, and totals, use the receipt-ocr instead.

    Example usage with cURL:

    curl -X 'POST' \
      'http://localhost:8000/ocr/' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'file=@images/paper-cash-sell-receipt-vector-23876532.jpg;type=image/jpeg'
    

Contributing

We welcome contributions to the Receipt OCR Engine! To contribute, please follow these steps:

  1. Fork the repository and clone it to your local machine.

  2. Create a new branch for your feature or bug fix.

  3. Set up your development environment:

    # Navigate to the project root
    cd receipt-ocr
    
    # Install uv
    curl -LsSf https://astral.sh/uv/install.sh | sh # OR pip install uv
    
    # Create and activate a virtual environment
    uv venv --python=3.12
    source .venv/bin/activate  # For Windows, use .venv\Scripts\activate
    
    # Install development and test dependencies
    uv sync --all-extras --dev
    uv pip install -r src/tesseract_ocr/requirements.txt
    uv pip install -e.
    
  4. Make your changes and ensure they adhere to the project's coding style.

  5. Run tests to ensure your changes haven't introduced any regressions:

    uv run pytest
    
  6. Run linting and formatting checks:

    uvx ruff check .
    uvx ruff format .
    
  7. Commit your changes with a clear and concise commit message.

  8. Push your branch to your forked repository.

  9. Open a Pull Request to the main branch of the upstream repository, describing your changes in detail.

LinkedIn Post

image

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receipt_ocr-0.3.1.tar.gz (602.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

receipt_ocr-0.3.1-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file receipt_ocr-0.3.1.tar.gz.

File metadata

  • Download URL: receipt_ocr-0.3.1.tar.gz
  • Upload date:
  • Size: 602.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for receipt_ocr-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b20894d3dac0e4acf7b1f524c7180b352f3600db7151e7657a4c123a7c7f8fb7
MD5 10c7a825c42270c5f29e1d6d46675a77
BLAKE2b-256 34e60f5b063c19911d7d7f44695a870bf5d0b10ae924882672524964484f7789

See more details on using hashes here.

File details

Details for the file receipt_ocr-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for receipt_ocr-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 318fc6af4797af050eeaf136530d37beb177b2f866357e9da4beeb05332004b3
MD5 58610998c2e00d19385478678dcdf165
BLAKE2b-256 b1d9860d1f9fbc1d66630e2e23b142fd6b5c90e65e4d62b670276b164871272f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page