Skip to main content

Python client library for moondream

Project description

Moondream Python Client Library

Official Python client library for Moondream, a tiny vision language model that can analyze images and answer questions about them. This library supports both local inference and cloud-based API access.

Features

  • Local Inference: Run the model directly on your machine using CPU
  • Cloud API: Access Moondream's hosted service for faster inference
  • Streaming: Stream responses token by token for real-time output
  • Multiple Model Sizes: Choose between 0.5B and 2B parameter models
  • Multiple Tasks: Caption images, answer questions, detect objects, and locate points

Installation

Install the package from PyPI:

pip install moondream==0.0.5

Quick Start

Using Cloud API

To use Moondream's cloud API, you'll first need an API key. Sign up for a free account at console.moondream.ai to get your key. Once you have your key, you can use it to initialize the client as shown below.

import moondream as md
from PIL import Image

# Initialize with API key
model = md.vl(api_key="your-api-key")

# Load an image
image = Image.open("path/to/image.jpg")

# Generate a caption
caption = model.caption(image)["caption"]
print("Caption:", caption)

# Ask a question
answer = model.query(image, "What's in this image?")["answer"]
print("Answer:", answer)

# Stream the response
for chunk in model.caption(image, stream=True)["caption"]:
    print(chunk, end="", flush=True)

Using Local Inference

First, download the model weights. We recommend the int8 weights for most applications:

Model Precision Download Size Memory Usage Download Link
Moondream 2B int8 1,733 MiB 2,624 MiB Download
Moondream 2B int4 1,167 MiB 2,002 MiB Download
Moondream 0.5B int8 593 MiB 996 MiB Download
Moondream 0.5B int4 422 MiB 816 MiB Download

Then use the model locally:

import moondream as md
from PIL import Image

# Initialize with local model path
model = md.vl(model="path/to/moondream-2b-int8.bin")

# Load and encode image
image = Image.open("path/to/image.jpg")

# Since encoding an image is computationally expensive, you can encode it once
# and reuse the encoded version for multiple queries/captions/etc. This avoids
# having to re-encode the same image multiple times.
encoded_image = model.encode_image(image)

# Generate caption
caption = model.caption(encoded_image)["caption"]
print("Caption:", caption)

# Ask questions
answer = model.query(encoded_image, "What's in this image?")["answer"]
print("Answer:", answer)

API Reference

Constructor

model = md.vl(
    model="path/to/model.bin",  # For local inference
    api_key="your-api-key"      # For cloud API access
)

Methods

caption(image, length="normal", stream=False, settings=None)

Generate a caption for an image.

result = model.caption(image)
# or with streaming
for chunk in model.caption(image, stream=True)["caption"]:
    print(chunk, end="")

query(image, question, stream=False, settings=None)

Ask a question about an image.

result = model.query(image, "What's in this image?")
# or with streaming
for chunk in model.query(image, "What's in this image?", stream=True)["answer"]:
    print(chunk, end="")

detect(image, object)

Detect and locate specific objects in an image.

result = model.detect(image, "car")

point(image, object)

Get coordinates of specific objects in an image.

result = model.point(image, "person")

Input Types

  • Images can be provided as:
    • PIL.Image.Image objects
    • Encoded image objects (from model.encode_image())

Response Types

All methods return typed dictionaries:

  • CaptionOutput: {"caption": str | Generator}
  • QueryOutput: {"answer": str | Generator}
  • DetectOutput: {"objects": List[Region]}
  • PointOutput: {"points": List[Point]}

Performance Notes

  • Local inference currently only supports CPU execution
  • CUDA (GPU) and MPS (Apple Silicon) support coming soon
  • For optimal performance with GPU/MPS, use the PyTorch implementation for now

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moondream-0.0.5.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moondream-0.0.5-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file moondream-0.0.5.tar.gz.

File metadata

  • Download URL: moondream-0.0.5.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for moondream-0.0.5.tar.gz
Algorithm Hash digest
SHA256 2a906c689c42c0556e603bce2b32bc9ad500f7120daba25f067dbbaf568cf505
MD5 c31fdaf135023eda7daeacb56b69b648
BLAKE2b-256 925b40ca654caa5da7215e7a0e6f3eb27b7e2e310de05f8cb588c9c64f3785f2

See more details on using hashes here.

File details

Details for the file moondream-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: moondream-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for moondream-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 27f5f5d3371fde335d358a61738a3da4eb5f66a2727ab6e772ed7ef936008e12
MD5 7f7d235d60be6868c69346cf7cff6e36
BLAKE2b-256 9a05a846b2680d92f6f69e0d5b2f2a917dd51024a83c37561bf7f05772c4c408

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page