Skip to main content

Pluggable agent backends for RAGAS dataset generation (default Llama Stack Responses API or LangGraph)

Project description

cohorte

Pluggable agent backends for RAGAS dataset generation. Use the default backend (Llama Stack Responses API) or a lang-graph backend (stub for future implementation).

Installation

From source with uv (recommended):

cd cohorte
uv sync --extra dev

With optional LangGraph extra (when that backend is implemented):

uv sync --extra dev --extra langgraph

With pip (from source):

pip install -e .

For PyPI (once published):

pip install cohorte

Usage

Agent with type

from cohorte import Agent

# Default: Llama Stack Responses API (same as ragas_pipeline / ragas_dataset_generator)
agent = Agent(type="default")
ragas_dataset = agent.generate_ragas_dataset(
    base_dataset=base_dataset,
    client=llama_stack_client,
    model_id="my-model",
    vector_store_id=vs_id,
    mcp_tools=mcp_tools,
    instructions="Optional system prompt",
)

Single query: invoke (normal agent call)

Same parameter shape as generate_ragas_dataset, but for one input. The name follows LangChain/LangGraph (agent.invoke(input)):

from cohorte import Agent

agent = Agent(type="default")
result = agent.invoke(
    "What is the capital of France?",
    client=llama_stack_client,
    model_id="my-model",
    vector_store_id=vs_id,
    mcp_tools=[],  # or list of MCP tool configs
    instructions="You are a helpful assistant.",
)
# result["answer"] -> str
# result["contexts"] -> list[str]  (retrieved chunks + non–file_search tool responses)
# result["tool_calls"] -> list[dict]

Chat with history (e.g. chatbots)

Pass messages so the model sees previous turns. Each message is {"role": "user"|"assistant"|"system", "content": str}:

history = [
    {"role": "user", "content": "My name is Alice."},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
]
result = agent.invoke(
    "What's my name?",
    client=client,
    model_id=model_id,
    vector_store_id=vs_id,
    mcp_tools=[],
    messages=history,
)
# result["answer"] can refer to the conversation (e.g. "Your name is Alice.")

Streaming: invoke_stream

For real-time display (e.g. Streamlit), use invoke_stream. It yields events: content_delta (chunk of text) then done (final answer + contexts + tool_calls). If the backend does not support token-level streaming, the full answer is sent as one delta then done.

for event in agent.invoke_stream(
    "Explain RAG in one sentence.",
    client=client,
    model_id=model_id,
    vector_store_id=vs_id,
    mcp_tools=[],
    messages=st.session_state.messages,  # optional history
):
    if event["type"] == "content_delta":
        print(event["delta"], end="", flush=True)
    elif event["type"] == "done":
        answer, contexts, tool_calls = event["answer"], event["contexts"], event["tool_calls"]

Streamlit chat example (with history + streaming)

import streamlit as st
from cohorte import Agent

# Init session state
if "messages" not in st.session_state:
    st.session_state.messages = []

agent = Agent(type="default")
# client, model_id, vector_store_id from your config (e.g. sidebar)

# Display history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

if prompt := st.chat_input("Your message"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        placeholder = st.empty()
        full = ""
        for event in agent.invoke_stream(
            prompt,
            client=client,
            model_id=model_id,
            vector_store_id=vector_store_id,
            mcp_tools=[],
            messages=st.session_state.messages[:-1],  # history (exclude current)
        ):
            if event["type"] == "content_delta":
                full += event["delta"]
                placeholder.markdown(full + "▌")
        placeholder.markdown(full)
    st.session_state.messages.append({"role": "assistant", "content": full})
# LangGraph backend (not yet implemented; will raise NotImplementedError)
agent = Agent(type="lang-graph")
# agent.invoke(...)  # NotImplementedError

Utilities

The library also exposes helpers used by the default backend, useful for custom pipelines:

from cohorte import strip_think_blocks, serialize_for_json, extract_tool_calls
  • strip_think_blocks(text) – remove <think>...</think> blocks from model output.
  • serialize_for_json(val) – convert objects to JSON-serializable form.
  • extract_tool_calls(response) – extract tool calls from a Llama Stack response.

Testing a simple invoke

Option 1: Unit tests (no Llama Stack server)
Runs invoke against a mock client so you can confirm the API shape:

cd cohorte
uv run pytest tests/test_agent.py -v -k invoke

Option 2: Real invoke against Llama Stack
Use the example script (requires a running Llama Stack and a vector store):

cd cohorte
export LLAMA_STACK_HOST=localhost
export LLAMA_STACK_PORT=8080
# optional: AGENT_VECTOR_STORE_NAME=rag-store, AGENT_MODEL_ID="your/model"

uv run python examples/simple_invoke.py "What is 2+2?"

You can pass any question as arguments; default is "What is 2+2?".

Option 3: OpenShift (oc)
If Llama Stack is exposed on OpenShift, use the helper script to get APPS_DOMAIN and run the example:

cd cohorte
./scripts/run_invoke_oc.sh "What is 2+2?"

The script sources .env (for PROJECT, etc.), runs oc get ingresses.config.openshift.io cluster for the apps domain, sets LLAMA_STACK_HOST to llama-stack-demo-route-${PROJECT}.${APPS_DOMAIN}, then runs the example with any arguments you pass.

Development

Uses uv for the venv and running tools. From the cohorte directory:

  • Create venv and install deps: make install-dev (or uv sync --extra dev)
  • Lock dependencies: make lock (or uv lock)
  • Lint: make lint (ruff via uv run)
  • Format: make format
  • Tests: make test (or uv run pytest tests)
  • Coverage: make coverage
  • Build: make build (or uv run python -m build)

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cohorte_ragas-0.1.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cohorte_ragas-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file cohorte_ragas-0.1.1.tar.gz.

File metadata

  • Download URL: cohorte_ragas-0.1.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.24

File hashes

Hashes for cohorte_ragas-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b16baa18db5dbf0bcfecf9d50719a20ededfdfdb3b14a52fbdb205e42f2eab50
MD5 c6aa303ea02717eff0cad99d83ef6e32
BLAKE2b-256 dab6bb9f7ba8d967307a24ab1f4792cc847c8ce7a23d435eef9b2c9c4587da1f

See more details on using hashes here.

File details

Details for the file cohorte_ragas-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cohorte_ragas-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a68ab9fc8951cd3c32bfe429d0f5be95a026b8f7ab9d0d9e9ad3ff58a5378900
MD5 a8d6860166aaea412913859f5ecd9561
BLAKE2b-256 9fb0b99c698eef5df97799fa067d899644b0989ce0677a51617563574b886c62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page