Pi-style LM authentication helpers for DSPy

These details have not been verified by PyPI

Project links

Project description

dspy-lm-auth

Pi-style LM authentication helpers for DSPy.

dspy-lm-auth lets DSPy reuse Pi credentials from ~/.pi/agent/auth.json, including ChatGPT Codex subscription auth.

The nicest way to use it is not as an isolated auth helper, but as the missing piece in a very practical DSPy workflow:

run a small model locally for the bulk of your cheap inference
use your existing ChatGPT subscription as the stronger GEPA reflection model

If you already pay for ChatGPT Plus or Pro, this gives you a pleasant way to explore DSPy without setting up a separate metered OpenAI API workflow just to optimize prompts.

Local compute is not literally free — your machine still does work — but it is a very good no-extra-API-bill workflow for experimentation.

Current support

OpenAI Codex / ChatGPT Plus or Pro subscription

What this guide will show

We will build a tiny French→English translator in DSPy.

The pattern is simple:

run qwen3.5:0.8b locally with Ollama
use that local model as the student model
use codex/gpt-5.4 through dspy-lm-auth as the reflection model
let GEPA improve the student program

This README intentionally sticks to JSONAdapter().

That is not because other adapters are uninteresting — quite the opposite. It is because a good tutorial should hold one thing steady at a time. If you want to compare JSONAdapter, XMLAdapter, and custom templated adapters, that is best treated as a separate benchmark project.

Install

uv pip install dspy-lm-auth

Or with pip:

pip install dspy-lm-auth

One-time login

If you already use Pi and your credentials are present in ~/.pi/agent/auth.json, you can skip this step.

Otherwise:

import dspy_lm_auth

dspy_lm_auth.login("codex")

That starts the OAuth flow and stores the resulting credentials in Pi's auth file.

Tutorial: local DSPy + subscription-powered GEPA

Step 1: run a small local model with Ollama

On Linux, install Ollama with:

curl -fsSL https://ollama.com/install.sh | sh

If the server is not already running, start it:

ollama serve

Now pull the model:

ollama pull qwen3.5:0.8b

Sanity check:

ollama run qwen3.5:0.8b --think=false "Translate French to English and return only the translation: merci beaucoup"

Why `ollama_chat/...` and `think=False`?

For this model family, the cleanest DSPy setup is the native Ollama LiteLLM route:

use ollama_chat/qwen3.5:0.8b
set think=False

That gives a cleaner programming experience than relying on the OpenAI-compatible Ollama endpoint for this particular model.

Step 2: configure the two models in DSPy

import dspy
import dspy_lm_auth

# Patch dspy.LM so `codex/...` works.
dspy_lm_auth.install()

# Cheap local student model.
student_lm = dspy.LM(
    "ollama_chat/qwen3.5:0.8b",
    api_base="http://127.0.0.1:11434",
    api_key="ollama",  # dummy value; LiteLLM expects one
    model_type="chat",
    think=False,
    temperature=0,
    max_tokens=200,
)

# Stronger reflection model used by GEPA to improve the prompt.
reflection_lm = dspy.LM("codex/gpt-5.4")

# All program inference goes through the local student model.
dspy.configure(lm=student_lm, adapter=dspy.JSONAdapter())

At this point you have the whole idea in place:

student model = local, cheap, yours
reflection model = stronger, subscription-backed, already paid for

Step 3: write a tiny DSPy program

import dspy


class TranslateFrenchToEnglish(dspy.Signature):
    """Translate the French input into short, natural English."""

    french: str = dspy.InputField(desc="French sentence")
    english: str = dspy.OutputField(desc="Natural English translation")


translator = dspy.Predict(TranslateFrenchToEnglish)

print(translator(french="merci beaucoup").english)
print(translator(french="où est la gare ?").english)

A tiny local model is often good enough to be useful, but not always good enough to be reliably right in the way you want.

That is where GEPA comes in.

Step 4: create a tiny training set

pairs = [
    ("bonjour", "hello"),
    ("merci beaucoup", "thank you very much"),
    ("où est la gare ?", "where is the train station?"),
    ("je suis fatigué", "I am tired"),
    ("il fait très chaud aujourd'hui", "it is very hot today"),
    ("je ne comprends pas", "I do not understand"),
    ("pouvez-vous m'aider ?", "can you help me?"),
    ("j'aime apprendre le français", "I like learning French"),
    ("nous arrivons demain matin", "we are arriving tomorrow morning"),
    ("combien ça coûte ?", "how much does it cost?"),
]

examples = [
    dspy.Example(french=fr, english=en).with_inputs("french")
    for fr, en in pairs
]

trainset = examples[:8]
valset = examples[8:]

This is intentionally tiny. The point of the tutorial is the workflow, not leaderboard chasing.

Step 5: define what “good” means

def metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    guess = pred.english.strip()
    target = gold.english.strip()

    exact = guess.lower() == target.lower()
    score = 1.0 if exact else 0.0

    if exact:
        feedback = (
            "Exact match. Keep translations short, natural, and direct. "
            "Do not add explanations."
        )
    else:
        feedback = (
            f"Expected {target!r} but got {guess!r}. "
            "Prefer direct, idiomatic English. Preserve tense, pronouns, and politeness. "
            "Do not explain the translation or add extra words."
        )

    return dspy.Prediction(score=score, feedback=feedback)

The metric is deliberately simple:

score exact matches as 1.0
score everything else as 0.0
give GEPA useful textual feedback so it can rewrite the prompt

Step 6: run GEPA

gepa = dspy.GEPA(
    metric=metric,
    reflection_lm=reflection_lm,
    auto="light",
)

optimized = gepa.compile(translator, trainset=trainset, valset=valset)

This is the moment the package earns its keep.

The student model stays local. GEPA uses the stronger subscription model to think about failures and improve the program. That is the whole value proposition in one place.

Step 7: inspect the optimized program

print("Optimized instruction:\n")
print(optimized.signature.instructions)
print()

print(optimized(french="je ne comprends pas").english)
print(optimized(french="combien ça coûte ?").english)

A good way to read the result is:

the local model is still the one doing inference
the stronger subscription model helped shape a better instruction
you did not need a separate metered API setup for the optimizer model

A complete copy-paste script

If you prefer one coherent script rather than step-by-step fragments, here is the full version:

import dspy
import dspy_lm_auth


dspy_lm_auth.install()

student_lm = dspy.LM(
    "ollama_chat/qwen3.5:0.8b",
    api_base="http://127.0.0.1:11434",
    api_key="ollama",
    model_type="chat",
    think=False,
    temperature=0,
    max_tokens=200,
)

reflection_lm = dspy.LM("codex/gpt-5.4")

dspy.configure(lm=student_lm, adapter=dspy.JSONAdapter())


class TranslateFrenchToEnglish(dspy.Signature):
    """Translate the French input into short, natural English."""

    french: str = dspy.InputField(desc="French sentence")
    english: str = dspy.OutputField(desc="Natural English translation")


translator = dspy.Predict(TranslateFrenchToEnglish)

pairs = [
    ("bonjour", "hello"),
    ("merci beaucoup", "thank you very much"),
    ("où est la gare ?", "where is the train station?"),
    ("je suis fatigué", "I am tired"),
    ("il fait très chaud aujourd'hui", "it is very hot today"),
    ("je ne comprends pas", "I do not understand"),
    ("pouvez-vous m'aider ?", "can you help me?"),
    ("j'aime apprendre le français", "I like learning French"),
    ("nous arrivons demain matin", "we are arriving tomorrow morning"),
    ("combien ça coûte ?", "how much does it cost?"),
]

examples = [
    dspy.Example(french=fr, english=en).with_inputs("french")
    for fr, en in pairs
]

trainset = examples[:8]
valset = examples[8:]

print("Before optimization:")
print(translator(french="où est la gare ?").english)
print(translator(french="je ne comprends pas").english)
print()


def metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    guess = pred.english.strip()
    target = gold.english.strip()

    exact = guess.lower() == target.lower()
    score = 1.0 if exact else 0.0

    if exact:
        feedback = (
            "Exact match. Keep translations short, natural, and direct. "
            "Do not add explanations."
        )
    else:
        feedback = (
            f"Expected {target!r} but got {guess!r}. "
            "Prefer direct, idiomatic English. Preserve tense, pronouns, and politeness. "
            "Do not explain the translation or add extra words."
        )

    return dspy.Prediction(score=score, feedback=feedback)


gepa = dspy.GEPA(
    metric=metric,
    reflection_lm=reflection_lm,
    auto="light",
)

optimized = gepa.compile(translator, trainset=trainset, valset=valset)

print("Optimized instruction:\n")
print(optimized.signature.instructions)
print()

print("After optimization:")
print(optimized(french="où est la gare ?").english)
print(optimized(french="je ne comprends pas").english)
print(optimized(french="combien ça coûte ?").english)

When you outgrow the laptop: the same idea on a GPU box

The laptop workflow is the easiest place to start.

When you want more speed or more context, keep the exact same mental model and swap only the student model:

laptop: Ollama + qwen3.5:0.8b
GPU box: vLLM + Qwen/Qwen3.5-0.8B

Minimal GPU setup

SSH into the GPU box:

ssh YOUR_GPU_BOX

Install uv and vllm:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

uv python install 3.12
uv venv ~/.venvs/vllm-qwen35-08b --python 3.12
uv pip install --python ~/.venvs/vllm-qwen35-08b/bin/python vllm

Launch the model:

CUDA_VISIBLE_DEVICES=0 ~/.venvs/vllm-qwen35-08b/bin/vllm serve Qwen/Qwen3.5-0.8B \
  --host 0.0.0.0 \
  --port 8000 \
  --served-model-name local-model \
  --dtype float16 \
  --gpu-memory-utilization 0.25 \
  --max-model-len 2048

Then swap the student model definition in DSPy to:

student_lm = dspy.LM(
    "openai/local-model",
    api_base="http://YOUR_GPU_BOX:8000/v1",
    api_key="",
    model_type="chat",
)

Everything else in the GEPA workflow stays the same.

If you only want the auth piece

You can also use dspy-lm-auth without the local-model tutorial.

import dspy
import dspy_lm_auth


dspy_lm_auth.install()

lm = dspy.LM("codex/gpt-5.4")
dspy.configure(lm=lm)

print(lm("hello")[0]["text"])

Or keep the original provider and select the auth route explicitly:

import dspy_lm_auth

lm = dspy_lm_auth.LM("openai/gpt-5.4", auth_provider="codex")
print(lm("hello")[0]["text"])

Credential resolution

API key credentials can be stored as:

a literal value
an environment variable name
a shell lookup prefixed with !

Examples:

{
  "some-provider": {
    "type": "api_key",
    "key": "OPENAI_API_KEY"
  }
}

{
  "some-provider": {
    "type": "api_key",
    "key": "!op read op://Private/openai/api_key --no-newline"
  }
}

Development

uv sync --extra dev
uv run pytest
uv run ruff check src tests

Roadmap

The package is structured so more Pi-like providers can be added later, for example:

Anthropic subscription auth
GitHub Copilot
Gemini CLI
Antigravity

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Mar 11, 2026

0.1.2

Mar 11, 2026

0.1.1

Mar 11, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_lm_auth-0.1.3.tar.gz (21.9 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dspy_lm_auth-0.1.3-py3-none-any.whl (16.2 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file dspy_lm_auth-0.1.3.tar.gz.

File metadata

Download URL: dspy_lm_auth-0.1.3.tar.gz
Upload date: Mar 11, 2026
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dspy_lm_auth-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`399433296691e5ef6fb6610457c6752005c9ebaec09e18061bf28a4f9ec870e6`
MD5	`fe7d38e9fa2c254730f665a5da4054eb`
BLAKE2b-256	`47055c2af92dc2dfbf9a50346b20924d427ae292ab66357322e5c2533497ba8c`

See more details on using hashes here.

File details

Details for the file dspy_lm_auth-0.1.3-py3-none-any.whl.

File metadata

Download URL: dspy_lm_auth-0.1.3-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dspy_lm_auth-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67102c73bf20e2e5736ae65fba4aff05c7d8a8f6a5dec302ea71c78bf097491f`
MD5	`f020ee741dcf736c27658e8fe1b5cdaa`
BLAKE2b-256	`d3d4063452617a0cc95e4b120a0777e54330ea2939deb22c384817ce87347789`

See more details on using hashes here.

dspy-lm-auth 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dspy-lm-auth

Current support

What this guide will show

Install

One-time login

Tutorial: local DSPy + subscription-powered GEPA

Step 1: run a small local model with Ollama

Why ollama_chat/... and think=False?

Step 2: configure the two models in DSPy

Step 3: write a tiny DSPy program

Step 4: create a tiny training set

Step 5: define what “good” means

Step 6: run GEPA

Step 7: inspect the optimized program

A complete copy-paste script

When you outgrow the laptop: the same idea on a GPU box

Minimal GPU setup

If you only want the auth piece

Credential resolution

Development

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Why `ollama_chat/...` and `think=False`?