A CLI tool for OCR using the Nougat model
Project description
MLX Nougat
MLX Nougat is a CLI tool for OCR using the Nougat model.
Installation
-
Install ImageMagick:
brew install imagemagick
-
Configure environment variables for ImageMagick:
Add the following lines to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc):
export MAGICK_HOME=$(brew --prefix imagemagick) export PATH=$MAGICK_HOME/bin:$PATH export DYLD_LIBRARY_PATH=$MAGICK_HOME/lib:$DYLD_LIBRARY_PATH
After adding these lines, reload your shell configuration or restart your terminal.
-
Install MLX Nougat:
git clone git@github.com:mzbac/mlx-nougat.git cd mlx-nougat pip install .
Usage
After installation, you can use MLX Nougat from the command line:
mlx_nougat --input <path_to_image_or_pdf_or_url> [--output <output_file>] [--model <model_name_or_path>]
Arguments
--input
: (Required) Path to the input image or PDF file, or a URL to an image or PDF.--output
: (Optional) Path to save the output text file. If not provided, the output will be printed to the console.--model
: (Optional) Name or path of the Nougat model to use. Default is "facebook/nougat-small".
Examples
-
Process a local image:
mlx_nougat --input path/to/your/image.png --output results.txt
-
Process a local PDF:
mlx_nougat --input path/to/your/document.pdf --output results.txt
-
Process a remote image:
mlx_nougat --input https://example.com/image.jpg --output results.txt
-
Process a remote PDF:
mlx_nougat --input https://example.com/document.pdf --output results.txt
-
Use a different model:
mlx_nougat --input path/to/your/image.png --model facebook/nougat-base --output results.txt
-
Use a quantized model:
mlx_nougat --input path/to/your/document.pdf --model mzbac/nougat-small-8bit-mlx
TODOs
- Support quantized model to improve the performance.
Acknowledgements
This project is built upon several open-source projects and research works:
- Nougat: The original Nougat model developed by Facebook AI Research.
- faster-nougat: An optimized implementation of Nougat, which inspired this MLX-based version.
- MLX: The machine learning framework developed by Apple, used for efficient model inference in this project.
- Transformers: Hugging Face's state-of-the-art natural language processing library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlx_nougat-0.1.2.tar.gz
.
File metadata
- Download URL: mlx_nougat-0.1.2.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bdb02f05c8569725c2953ff659164a0bfc39685ca66951676d25927441ac33e |
|
MD5 | 61204a69ad338679a7a41a8658f1d87b |
|
BLAKE2b-256 | b075ed31c1e8d958dc4e729db3fd4bce80df3af1c9c740318292b678a4eb4f7d |
File details
Details for the file mlx_nougat-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: mlx_nougat-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ba15588cc98919ef35ab23945741a9824d870a7a54c76c3696c21ed0952acfb |
|
MD5 | 7795d91e0c7a6661710bb068c921ad72 |
|
BLAKE2b-256 | 5d1c64877ad294120426975f0ad4722271df53db7490b99179df088e5c4e6c1d |