Skip to main content

A CLI tool for OCR using the Nougat model

Project description

MLX Nougat

MLX Nougat is a CLI tool for OCR using the Nougat model.

Installation

  1. Install ImageMagick:

    brew install imagemagick
    
  2. Configure environment variables for ImageMagick:

    Add the following lines to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc):

    export MAGICK_HOME=$(brew --prefix imagemagick)
    export PATH=$MAGICK_HOME/bin:$PATH
    export DYLD_LIBRARY_PATH=$MAGICK_HOME/lib:$DYLD_LIBRARY_PATH
    

    After adding these lines, reload your shell configuration or restart your terminal.

  3. Install MLX Nougat:

    git clone git@github.com:mzbac/mlx-nougat.git
    cd mlx-nougat
    pip install .
    

Usage

After installation, you can use MLX Nougat from the command line:

mlx_nougat --input <path_to_image_or_pdf_or_url> [--output <output_file>] [--model <model_name_or_path>]

Arguments

  • --input: (Required) Path to the input image or PDF file, or a URL to an image or PDF.
  • --output: (Optional) Path to save the output text file. If not provided, the output will be printed to the console.
  • --model: (Optional) Name or path of the Nougat model to use. Default is "facebook/nougat-small".

Examples

  1. Process a local image:

    mlx_nougat --input path/to/your/image.png --output results.txt
    
  2. Process a local PDF:

    mlx_nougat --input path/to/your/document.pdf --output results.txt
    
  3. Process a remote image:

    mlx_nougat --input https://example.com/image.jpg --output results.txt
    
  4. Process a remote PDF:

    mlx_nougat --input https://example.com/document.pdf --output results.txt
    
  5. Use a different model:

    mlx_nougat --input path/to/your/image.png --model facebook/nougat-base --output results.txt
    
  6. Use a quantized model:

    mlx_nougat --input path/to/your/document.pdf --model mzbac/nougat-small-8bit-mlx
    

TODOs

  • Support quantized model to improve the performance.

Acknowledgements

This project is built upon several open-source projects and research works:

  • Nougat: The original Nougat model developed by Facebook AI Research.
  • faster-nougat: An optimized implementation of Nougat, which inspired this MLX-based version.
  • MLX: The machine learning framework developed by Apple, used for efficient model inference in this project.
  • Transformers: Hugging Face's state-of-the-art natural language processing library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_nougat-0.1.2.tar.gz (14.8 kB view hashes)

Uploaded Source

Built Distribution

mlx_nougat-0.1.2-py3-none-any.whl (15.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page