CLI tool for generating text from images using the Gemma 3 model.
Project description
atai-gemma3-tool
atai-gemma3-tool is a command-line interface (CLI) tool that uses Google's Gemma 3 model to generate descriptive text from local image files. It leverages the power of a state-of-the-art multimodal model to process images and stream textual outputs in real time.
Features
- Multimodal Processing: Accepts image input and produces text output.
- Real-Time Streaming: Generates and streams tokens as they are produced.
- Customizable Prompt: Allows users to define a custom prompt.
- Easy Installation: Installable via pip with all dependencies handled.
- Asynchronous Generation: Utilizes asynchronous token streaming for quick response times.
Installation
Clone the repository and install the package in editable mode:
pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
pip install atai-gemma3-tool
Usage
Run the CLI tool from your terminal by specifying the path to your image file and an optional custom prompt:
atai-gemma3-tool "path/to/your/local_image.jpg" --prompt "Describe this image in detail."
atai-gemma3-tool https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG
Command Line Arguments
- image_path: The path to your local image file or a image url.
- --prompt: (Optional) Custom prompt for text generation.
Default:"Describe this image in detail."
The tool will load the image, process it using the Gemma 3 model, and output the generated text to your console in real time.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgements
- Google DeepMind: For the Gemma 3 model.
- Hugging Face: For the Transformers library and supporting tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atai_gemma3_tool-0.0.2.tar.gz.
File metadata
- Download URL: atai_gemma3_tool-0.0.2.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ea3dc0d4d7ddf0322c82b59d17c11807f0f56c47e16bec987bbfea2a1b8b426
|
|
| MD5 |
96c40583f3f2a999035767990b491498
|
|
| BLAKE2b-256 |
87559a3b854ccf88596012b68926c35f4b8254b03655dfa29486cfacd6b3776b
|
File details
Details for the file atai_gemma3_tool-0.0.2-py3-none-any.whl.
File metadata
- Download URL: atai_gemma3_tool-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2020ae70559fa28b030197b213505c3414211a82de5f5e704f8673a4427bf285
|
|
| MD5 |
c09b9c5334c650cc9d443802458184d2
|
|
| BLAKE2b-256 |
f868fc63262f11362aea1144da3523cca4436d5435b2898c6e21fbba4923648d
|