Skip to main content

Python/Mojo interface for Google Gemma 3 using MAX Engine

Project description

mogemma

Python/Mojo interface for Google Gemma 3 with MAX Engine.

Features

  • Embeddings — Generate dense vector embeddings through the Mojo backend
  • Text generation — Synchronous and async streaming text generation with configurable sampling (temperature, top-k, top-p)
  • HuggingFace Hub — Automatically resolves local paths and downloads missing HF IDs into the cache
  • OpenTelemetry — Optional tracing instrumentation
  • Lazy imports — Only loads what you use; optional extras keep the install slim

Installation

pip install mogemma

huggingface-hub is included in the base installation, so Hugging Face model IDs are supported by default.

Optional extras

Extra What it adds Install
embed numpy (for embeddings) pip install 'mogemma[embed]'
text numpy + tokenizers (for text generation) pip install 'mogemma[text]'
telemetry opentelemetry-api (for tracing) pip install 'mogemma[telemetry]'
all everything above pip install 'mogemma[all]'

Model resolution behavior

  • Model identifiers are accepted in namespace/model format (for example, google/gemma-3-1b).
  • Missing IDs are downloaded into the local cache on first use unless disabled by your offline settings.

Quick start

Embeddings

from mogemma import EmbeddingConfig, EmbeddingModel

config = EmbeddingConfig(model_path="google/gemma-3-1b")
model = EmbeddingModel(config)

embeddings = model.embed(["Hello, world!", "Model outputs are computed by MAX Engine."])
print(embeddings.shape)  # (2, hidden_dim)

Text generation

from mogemma import GenerationConfig, SyncGemmaModel

config = GenerationConfig(
    model_path="google/gemma-3-1b",
    max_new_tokens=64,
    temperature=0.7,
)
model = SyncGemmaModel(config)

# Full generation
print(model.generate("Explain quantum computing in one sentence:"))

# Streaming
for token in model.generate_stream("Once upon a time"):
    print(token, end="", flush=True)

Async streaming

import asyncio
from mogemma import GenerationConfig
from mogemma.model import AsyncGemmaModel

config = GenerationConfig(model_path="google/gemma-3-1b", max_new_tokens=64)
model = AsyncGemmaModel(config)

async def main():
    async for token in model.generate_stream("The future of AI is"):
        print(token, end="", flush=True)

asyncio.run(main())

Development

# Clone and install everything
git clone https://github.com/cofin/mogemma.git
cd mogemma
make install

# Run tests
make test

# Lint and type-check
make lint

# Run release preflight checks
make check-release

# Build the Mojo shared library
make build

Troubleshooting

  • Model path ... exists but is not a directory: point to a model directory or use a valid Hugging Face model id.
  • offline mode: set HF_HUB_OFFLINE=0, ensure network access, then retry.
  • HF_TOKEN: configure a token (huggingface-cli login) for private/restricted repos.
  • Cached downloads are stored in ~/.cache/mogemma.

Implementation evidence

  • Core output contracts and error paths:
    • src/mo/tests/test_core_contract.py
    • src/py/tests/test_contracts_integration.py
  • Runtime contract behavior (sync/async generation and embeddings):
    • src/py/tests/test_gemma_model.py
    • src/py/tests/test_embeddings.py
    • src/py/tests/test_async.py
  • Model delivery and hub error handling:
    • src/py/tests/test_hub.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mogemma-0.1.0.tar.gz (102.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mogemma-0.1.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file mogemma-0.1.0.tar.gz.

File metadata

  • Download URL: mogemma-0.1.0.tar.gz
  • Upload date:
  • Size: 102.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mogemma-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4a7f20674d67ee861eeeb29f2f4d601cc10251121bbd03f67fbf1c45721d02d9
MD5 b1750457bffb736222645089b99bfa5b
BLAKE2b-256 8a32401ad7fbf5235ec83448629648053a8193fbeff197116e0408e701390a83

See more details on using hashes here.

Provenance

The following attestation bundles were made for mogemma-0.1.0.tar.gz:

Publisher: publish.yml on cofin/mogemma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mogemma-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mogemma-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mogemma-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10b5f9a05fb2d8d88e59c67331558652a8fa960152ad46ddfe738176edd9d2f0
MD5 30c1db0eee7670216432d37a88dae3f2
BLAKE2b-256 1c6801f150651646ea7ebad8bd9dd0221c8e195f2a5af408ff552cf62f696c27

See more details on using hashes here.

Provenance

The following attestation bundles were made for mogemma-0.1.0-py3-none-any.whl:

Publisher: publish.yml on cofin/mogemma

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page