A CLI tool for OCR using the Nougat model
Project description
MLX Nougat
MLX Nougat is a CLI tool for OCR using the Nougat model.
Installation
-
Install ImageMagick:
brew install imagemagick
-
Configure environment variables for ImageMagick:
Add the following lines to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc):
export MAGICK_HOME=$(brew --prefix imagemagick) export PATH=$MAGICK_HOME/bin:$PATH export DYLD_LIBRARY_PATH=$MAGICK_HOME/lib:$DYLD_LIBRARY_PATH
After adding these lines, reload your shell configuration or restart your terminal.
-
Install MLX Nougat:
git clone git@github.com:mzbac/mlx-nougat.git cd mlx-nougat pip install .
Usage
After installation, you can use MLX Nougat from the command line:
mlx_nougat --input <path_to_image_or_pdf_or_url> [--output <output_file>] [--model <model_name_or_path>]
Arguments
--input
: (Required) Path to the input image or PDF file, or a URL to an image or PDF.--output
: (Optional) Path to save the output text file. If not provided, the output will be printed to the console.--model
: (Optional) Name or path of the Nougat model to use. Default is "facebook/nougat-small".
Examples
-
Process a local image:
mlx_nougat --input path/to/your/image.png --output results.txt
-
Process a local PDF:
mlx_nougat --input path/to/your/document.pdf --output results.txt
-
Process a remote image:
mlx_nougat --input https://example.com/image.jpg --output results.txt
-
Process a remote PDF:
mlx_nougat --input https://example.com/document.pdf --output results.txt
-
Use a different model:
mlx_nougat --input path/to/your/image.png --model facebook/nougat-base --output results.txt
TODOs
- Support quantized model to improve the performance.
Acknowledgements
This project is built upon several open-source projects and research works:
- Nougat: The original Nougat model developed by Facebook AI Research.
- faster-nougat: An optimized implementation of Nougat, which inspired this MLX-based version.
- MLX: The machine learning framework developed by Apple, used for efficient model inference in this project.
- Transformers: Hugging Face's state-of-the-art natural language processing library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mlx_nougat-0.1.0.tar.gz
(12.5 kB
view hashes)
Built Distribution
mlx_nougat-0.1.0-py3-none-any.whl
(12.9 kB
view hashes)
Close
Hashes for mlx_nougat-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eed847fb41905b480b7d90d3c2ed0f62e304364209f98f74ccf3caffdb2b7fa |
|
MD5 | 253c71ad54eaa8785606a812ef71455c |
|
BLAKE2b-256 | 73532c46b2af01a95c5bdd5bd4971e2c98c3956f810c09ff1f7c40ce2c35a1d3 |