Skip to main content

The Oshepherd guiding the Ollama(s) inference orchestration.

Project description

oshepherd

The Oshepherd guiding the Ollama(s) inference orchestration.

oshepherd logo

PyPI Version MIT License DeepWiki

A centralized FastAPI service, using Celery and Redis to orchestrate multiple Ollama servers as workers.

Install

pip install oshepherd

Usage

  1. Setup Redis:

    Celery uses Redis as message broker and backend. You'll need a Redis instance, which you can provision for free in redislabs.com.

  2. Setup FastAPI Server:

    # define configuration env file
    # use credentials for redis as broker and backend
    cp .api.env.template .api.env
    
    # start api
    oshepherd start-api --env-file .api.env
    
  3. Setup Celery/Ollama Worker(s):

    # install ollama https://ollama.com/download
    # optionally pull the model
    ollama pull mistral
    
    # define configuration env file
    # use credentials for redis as broker and backend
    cp .worker.env.template .worker.env
    
    # start worker
    oshepherd start-worker --env-file .worker.env
    
  4. Now you're ready to execute Ollama completions remotely. You can point your Ollama client to your oshepherd api server by setting the host, and it will return your requested completions from any of the workers:

    import ollama
    
    client = ollama.Client(host="http://127.0.0.1:5001")
    
    # Standard request
    response = client.generate(model="mistral", prompt="Why is the sky blue?")
    
    # Streaming request
    for chunk in client.generate(model="mistral", prompt="Why is the sky blue?", stream=True):
        print(chunk['response'], end='', flush=True)
    

    For a complete Python example with streaming support, see examples/pretty_streaming.py.

    import { Ollama } from "ollama/browser";
    
    const ollama = new Ollama({ host: "http://127.0.0.1:5001" });
    
    // Standard request
    const response = await ollama.generate({
        model: "mistral",
        prompt: "Why is the sky blue?",
    });
    
    // Streaming request
    const streamResponse = await ollama.generate({
        model: "mistral",
        prompt: "Why is the sky blue?",
        stream: true
    });
    
    for await (const chunk of streamResponse) {
        process.stdout.write(chunk.response);
    }
    

    For a complete TypeScript/JavaScript example with streaming support, see examples/ts-scripts/README.md.

    • Raw http request:
    curl -X POST -H "Content-Type: application/json" -L http://127.0.0.1:5001/api/generate/ \
    -d '{"model":"mistral","prompt":"Why is the sky blue?","stream":true}' \
    --no-buffer
    

Disclaimers 🚨

This package is in alpha, its architecture and api might change in the near future. Currently this is getting tested in a controlled environment by real users, but haven't been audited, nor tested thorugly. Use it at your own risk.

As this is an alpha version, support and responses might be limited. We'll do our best to address questions and issues as quickly as possible.

API server parity

  • Generate a completion: POST /api/generate
  • Generate a chat completion: POST /api/chat
  • Generate Embeddings: POST /api/embeddings
  • List Local Models: GET /api/tags
  • Version: GET /api/version
  • Show Model Information: POST /api/show (pending)
  • List Running Models: GET /api/ps (pending)

Oshepherd API server currently supports the endpoints listed above, enabling full compatibility with official Ollama clients (i.e.: ollama-python, ollama-js). These endpoints provide comprehensive functionality for the most common use cases. Additional endpoints from the official Ollama API are not planned for the near future. For more details on the full Ollama API specifications, refer to the Ollama API documentation.

Contribution guidelines

We welcome contributions! If you find a bug or have suggestions for improvements, please open an issue or submit a pull request pointing to development branch. Before creating a new issue/pull request, take a moment to search through the existing issues/pull requests to avoid duplicates.

Conda Support

To run and build locally you can use conda:

conda create -n oshepherd python=3.12
conda activate oshepherd
pip install -r requirements.txt

# install oshepherd
pip install -e .
Tests

Follow usage instructions to start api server and celery worker using a local ollama, and then run the tests:

pytest -s tests/

Author

This is a project developed and maintained by mnemonica.ai.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oshepherd-0.0.21.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oshepherd-0.0.21-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file oshepherd-0.0.21.tar.gz.

File metadata

  • Download URL: oshepherd-0.0.21.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for oshepherd-0.0.21.tar.gz
Algorithm Hash digest
SHA256 0f188021be9efb3121ec9166b7758966b2bd6cab4e5a4a8b43adfa88fbb0c353
MD5 23bfffbfe1c3f79c91473865fbd0eaab
BLAKE2b-256 d24d225da5e72c859ffc846b3ad1032dd0304329e5ebcc6629b774b69714d4bc

See more details on using hashes here.

File details

Details for the file oshepherd-0.0.21-py3-none-any.whl.

File metadata

  • Download URL: oshepherd-0.0.21-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for oshepherd-0.0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 cefd7ad950d4f1d9cf6936e2e0d4f6e54cfaba02b9ea5d552e9988029c01aea7
MD5 41ca6e02a038e8bafd7cbc8d88f44af3
BLAKE2b-256 d06ffd709a9183cbab81be82eb3e05ce428372055c15ddd811d3c1c00334fa1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page