Skip to main content

Pi-style LM authentication helpers for DSPy

Project description

dspy-lm-auth

CI PyPI version License: MIT

Pi-style LM authentication helpers for DSPy.

dspy-lm-auth lets DSPy reuse Pi credentials from ~/.pi/agent/auth.json, including ChatGPT Codex subscription auth.

The nicest way to use it is not as an isolated auth helper, but as the missing piece in a very practical DSPy workflow:

  • run a small model locally for the bulk of your cheap inference
  • use your existing ChatGPT subscription as the stronger GEPA reflection model

If you already pay for ChatGPT Plus or Pro, this gives you a pleasant way to explore DSPy without setting up a separate metered OpenAI API workflow just to optimize prompts.

Local compute is not literally free — your machine still does work — but it is a very good no-extra-API-bill workflow for experimentation.

Current support

  • OpenAI Codex / ChatGPT Plus or Pro subscription

What this guide will show

We will build a tiny French→English translator in DSPy.

The pattern is simple:

  1. run qwen3.5:0.8b locally with Ollama
  2. use that local model as the student model
  3. use codex/gpt-5.4 through dspy-lm-auth as the reflection model
  4. let GEPA improve the student program

This README intentionally sticks to JSONAdapter().

That is not because other adapters are uninteresting — quite the opposite. It is because a good tutorial should hold one thing steady at a time. If you want to compare JSONAdapter, XMLAdapter, and custom templated adapters, that is best treated as a separate benchmark project.

Install

uv pip install dspy-lm-auth

Or with pip:

pip install dspy-lm-auth

One-time login

If you already use Pi and your credentials are present in ~/.pi/agent/auth.json, you can skip this step.

Otherwise:

import dspy_lm_auth

dspy_lm_auth.login("codex")

That starts the OAuth flow and stores the resulting credentials in Pi's auth file.

Tutorial: local DSPy + subscription-powered GEPA

Step 1: run a small local model with Ollama

On Linux, install Ollama with:

curl -fsSL https://ollama.com/install.sh | sh

If the server is not already running, start it:

ollama serve

Now pull the model:

ollama pull qwen3.5:0.8b

Sanity check:

ollama run qwen3.5:0.8b --think=false "Translate French to English and return only the translation: merci beaucoup"

Why ollama_chat/... and think=False?

For this model family, the cleanest DSPy setup is the native Ollama LiteLLM route:

  • use ollama_chat/qwen3.5:0.8b
  • set think=False

That gives a cleaner programming experience than relying on the OpenAI-compatible Ollama endpoint for this particular model.

Step 2: configure the two models in DSPy

import dspy
import dspy_lm_auth

# Patch dspy.LM so `codex/...` works.
dspy_lm_auth.install()

# Cheap local student model.
student_lm = dspy.LM(
    "ollama_chat/qwen3.5:0.8b",
    api_base="http://127.0.0.1:11434",
    api_key="ollama",  # dummy value; LiteLLM expects one
    model_type="chat",
    think=False,
    temperature=0,
    max_tokens=200,
)

# Stronger reflection model used by GEPA to improve the prompt.
reflection_lm = dspy.LM("codex/gpt-5.4")

# All program inference goes through the local student model.
dspy.configure(lm=student_lm, adapter=dspy.JSONAdapter())

At this point you have the whole idea in place:

  • student model = local, cheap, yours
  • reflection model = stronger, subscription-backed, already paid for

Step 3: write a tiny DSPy program

import dspy


class TranslateFrenchToEnglish(dspy.Signature):
    """Translate the French input into short, natural English."""

    french: str = dspy.InputField(desc="French sentence")
    english: str = dspy.OutputField(desc="Natural English translation")


translator = dspy.Predict(TranslateFrenchToEnglish)

print(translator(french="merci beaucoup").english)
print(translator(french="où est la gare ?").english)

A tiny local model is often good enough to be useful, but not always good enough to be reliably right in the way you want.

That is where GEPA comes in.

Step 4: create a tiny training set

pairs = [
    ("bonjour", "hello"),
    ("merci beaucoup", "thank you very much"),
    ("où est la gare ?", "where is the train station?"),
    ("je suis fatigué", "I am tired"),
    ("il fait très chaud aujourd'hui", "it is very hot today"),
    ("je ne comprends pas", "I do not understand"),
    ("pouvez-vous m'aider ?", "can you help me?"),
    ("j'aime apprendre le français", "I like learning French"),
    ("nous arrivons demain matin", "we are arriving tomorrow morning"),
    ("combien ça coûte ?", "how much does it cost?"),
]

examples = [
    dspy.Example(french=fr, english=en).with_inputs("french")
    for fr, en in pairs
]

trainset = examples[:8]
valset = examples[8:]

This is intentionally tiny. The point of the tutorial is the workflow, not leaderboard chasing.

Step 5: define what “good” means

def metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    guess = pred.english.strip()
    target = gold.english.strip()

    exact = guess.lower() == target.lower()
    score = 1.0 if exact else 0.0

    if exact:
        feedback = (
            "Exact match. Keep translations short, natural, and direct. "
            "Do not add explanations."
        )
    else:
        feedback = (
            f"Expected {target!r} but got {guess!r}. "
            "Prefer direct, idiomatic English. Preserve tense, pronouns, and politeness. "
            "Do not explain the translation or add extra words."
        )

    return dspy.Prediction(score=score, feedback=feedback)

The metric is deliberately simple:

  • score exact matches as 1.0
  • score everything else as 0.0
  • give GEPA useful textual feedback so it can rewrite the prompt

Step 6: run GEPA

gepa = dspy.GEPA(
    metric=metric,
    reflection_lm=reflection_lm,
    auto="light",
)

optimized = gepa.compile(translator, trainset=trainset, valset=valset)

This is the moment the package earns its keep.

The student model stays local. GEPA uses the stronger subscription model to think about failures and improve the program. That is the whole value proposition in one place.

Step 7: inspect the optimized program

print("Optimized instruction:\n")
print(optimized.signature.instructions)
print()

print(optimized(french="je ne comprends pas").english)
print(optimized(french="combien ça coûte ?").english)

A good way to read the result is:

  • the local model is still the one doing inference
  • the stronger subscription model helped shape a better instruction
  • you did not need a separate metered API setup for the optimizer model

A complete copy-paste script

If you prefer one coherent script rather than step-by-step fragments, here is the full version:

import dspy
import dspy_lm_auth


dspy_lm_auth.install()

student_lm = dspy.LM(
    "ollama_chat/qwen3.5:0.8b",
    api_base="http://127.0.0.1:11434",
    api_key="ollama",
    model_type="chat",
    think=False,
    temperature=0,
    max_tokens=200,
)

reflection_lm = dspy.LM("codex/gpt-5.4")

dspy.configure(lm=student_lm, adapter=dspy.JSONAdapter())


class TranslateFrenchToEnglish(dspy.Signature):
    """Translate the French input into short, natural English."""

    french: str = dspy.InputField(desc="French sentence")
    english: str = dspy.OutputField(desc="Natural English translation")


translator = dspy.Predict(TranslateFrenchToEnglish)

pairs = [
    ("bonjour", "hello"),
    ("merci beaucoup", "thank you very much"),
    ("où est la gare ?", "where is the train station?"),
    ("je suis fatigué", "I am tired"),
    ("il fait très chaud aujourd'hui", "it is very hot today"),
    ("je ne comprends pas", "I do not understand"),
    ("pouvez-vous m'aider ?", "can you help me?"),
    ("j'aime apprendre le français", "I like learning French"),
    ("nous arrivons demain matin", "we are arriving tomorrow morning"),
    ("combien ça coûte ?", "how much does it cost?"),
]

examples = [
    dspy.Example(french=fr, english=en).with_inputs("french")
    for fr, en in pairs
]

trainset = examples[:8]
valset = examples[8:]

print("Before optimization:")
print(translator(french="où est la gare ?").english)
print(translator(french="je ne comprends pas").english)
print()


def metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    guess = pred.english.strip()
    target = gold.english.strip()

    exact = guess.lower() == target.lower()
    score = 1.0 if exact else 0.0

    if exact:
        feedback = (
            "Exact match. Keep translations short, natural, and direct. "
            "Do not add explanations."
        )
    else:
        feedback = (
            f"Expected {target!r} but got {guess!r}. "
            "Prefer direct, idiomatic English. Preserve tense, pronouns, and politeness. "
            "Do not explain the translation or add extra words."
        )

    return dspy.Prediction(score=score, feedback=feedback)


gepa = dspy.GEPA(
    metric=metric,
    reflection_lm=reflection_lm,
    auto="light",
)

optimized = gepa.compile(translator, trainset=trainset, valset=valset)

print("Optimized instruction:\n")
print(optimized.signature.instructions)
print()

print("After optimization:")
print(optimized(french="où est la gare ?").english)
print(optimized(french="je ne comprends pas").english)
print(optimized(french="combien ça coûte ?").english)

When you outgrow the laptop: the same idea on a GPU box

The laptop workflow is the easiest place to start.

When you want more speed or more context, keep the exact same mental model and swap only the student model:

  • laptop: Ollama + qwen3.5:0.8b
  • GPU box: vLLM + Qwen/Qwen3.5-0.8B

Minimal GPU setup

SSH into the GPU box:

ssh YOUR_GPU_BOX

Install uv and vllm:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

uv python install 3.12
uv venv ~/.venvs/vllm-qwen35-08b --python 3.12
uv pip install --python ~/.venvs/vllm-qwen35-08b/bin/python vllm

Launch the model:

CUDA_VISIBLE_DEVICES=0 ~/.venvs/vllm-qwen35-08b/bin/vllm serve Qwen/Qwen3.5-0.8B \
  --host 0.0.0.0 \
  --port 8000 \
  --served-model-name local-model \
  --dtype float16 \
  --gpu-memory-utilization 0.25 \
  --max-model-len 2048

Then swap the student model definition in DSPy to:

student_lm = dspy.LM(
    "openai/local-model",
    api_base="http://YOUR_GPU_BOX:8000/v1",
    api_key="",
    model_type="chat",
)

Everything else in the GEPA workflow stays the same.

If you only want the auth piece

You can also use dspy-lm-auth without the local-model tutorial.

import dspy
import dspy_lm_auth


dspy_lm_auth.install()

lm = dspy.LM("codex/gpt-5.4")
dspy.configure(lm=lm)

print(lm("hello")[0]["text"])

Or keep the original provider and select the auth route explicitly:

import dspy_lm_auth

lm = dspy_lm_auth.LM("openai/gpt-5.4", auth_provider="codex")
print(lm("hello")[0]["text"])

Credential resolution

API key credentials can be stored as:

  • a literal value
  • an environment variable name
  • a shell lookup prefixed with !

Examples:

{
  "some-provider": {
    "type": "api_key",
    "key": "OPENAI_API_KEY"
  }
}
{
  "some-provider": {
    "type": "api_key",
    "key": "!op read op://Private/openai/api_key --no-newline"
  }
}

Development

uv sync --extra dev
uv run pytest
uv run ruff check src tests

Roadmap

The package is structured so more Pi-like providers can be added later, for example:

  • Anthropic subscription auth
  • GitHub Copilot
  • Gemini CLI
  • Antigravity

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_lm_auth-0.1.3.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspy_lm_auth-0.1.3-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file dspy_lm_auth-0.1.3.tar.gz.

File metadata

  • Download URL: dspy_lm_auth-0.1.3.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dspy_lm_auth-0.1.3.tar.gz
Algorithm Hash digest
SHA256 399433296691e5ef6fb6610457c6752005c9ebaec09e18061bf28a4f9ec870e6
MD5 fe7d38e9fa2c254730f665a5da4054eb
BLAKE2b-256 47055c2af92dc2dfbf9a50346b20924d427ae292ab66357322e5c2533497ba8c

See more details on using hashes here.

File details

Details for the file dspy_lm_auth-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: dspy_lm_auth-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dspy_lm_auth-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 67102c73bf20e2e5736ae65fba4aff05c7d8a8f6a5dec302ea71c78bf097491f
MD5 f020ee741dcf736c27658e8fe1b5cdaa
BLAKE2b-256 d3d4063452617a0cc95e4b120a0777e54330ea2939deb22c384817ce87347789

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page