lightweight abstractions for building agent scaffolds

These details have not been verified by PyPI

Project description

agentlens

This library contains a set of lightweight abstractions for building agent scaffolds that are easy to evaluate and maintain.

Features

Decorator-driven logic—define arbitrarily complex scaffolds and evaluations by composing functions
Expressive evaluation framework—run evals with hooks for full control over your agent's computation graph
ORM for datasets—quickly bootstrap type-safe, validated datasets with zero boilerplate
Built-in observability—easy integration with Langfuse
Clean inference API—call models using a syntax inspired by Vercel's very elegant AI SDK

Configuration

Initialize an AI object to manage your project's AI logic. Some notes:

Use of Langfuse is optional
Global concurrency limits are set on a per-model basis
You can use OpenAI models, Anthropic models, or both

# File: /your_project/ai.py

from pathlib import Path

from langfuse import Langfuse
from agentlens import AI, AnthropicProvider, OpenAIProvider

PROJECT_ROOT = Path(__file__).parent

ai = AI(
    dataset_dir=PROJECT_ROOT / "datasets",  # where your datasets will be stored
    cache_dir=PROJECT_ROOT,  # where to store cached responses
    observability=Langfuse(  # optional
        secret_key="...",
        public_key="...",
        host="...",
    ),
    providers=[
        OpenAIProvider(
            api_key="...",  # your OpenAI API key
            max_connections={  # maximum number of concurrent requests per model
                "DEFAULT": 10,
                "gpt-4o-mini": 30,
            },
        ),
        AnthropicProvider(
            api_key="...",  # your Anthropic API key
            max_connections={
                "DEFAULT": 10,
                "claude-3-5-sonnet": 5,
            },
        ),
    ],
)

By default API keys will be read from environment variables, but you can also pass them in directly.

Tasks

The basic building block of the library is a task. A task is a function that makes one or more calls to an AI model.

Declaring a function as a task enters it into a unified observability and evaluation ecosystem. Do so using the @ai.task() decorator:

from your_project.ai import ai


@ai.task()
def some_task(some_input: str) -> str:
    pass  # insert some AI logic here

The @ai.task() decorator takes the following optional arguments:

cache: bool = False--cache the input/output of the task for use in evaluations
max_retries: int = 0--number of retries on failure, defaults to 0

And if Langfuse is enabled, you can also use:

capture_input: bool = True--log input data to Langfuse
capture_output: bool = True--log output data to Langfuse

Important note: caching and logging only work on serializable values.

The library will automatically serialize the following data types for you:

Primitives (e.g. str, int, float, bool)
Pydantic models
A subclass of the library-provided Serializable, which must implement model_dump() and model_validate() methods that serialize and deserialize the object, respectively
Collections of the above types (lists, dictionaries, sets, tuples)

Serializable will check at evaluation-time that any serialization/deserialization methods you've implemented are lossless, and will raise an exception if they are not.

Here is how you might render a PDF as a JSON-serializable object:

import base64
from io import BytesIO
from typing import Any

from PIL import Image
from agentlens import Serializable


class PDF(Serializable):
    """A serializable PDF document."""

    pages: list[Image.Image]

    def model_dump(self) -> dict[str, Any]:
        """Convert to a dictionary of serializable Python objects."""
        pages_b64 = []
        for page in self.pages:
            with BytesIO() as buffer:
                page.save(buffer, format="PNG")
                pages_b64.append(base64.b64encode(buffer.getvalue()).decode())
        return {"pages": pages_b64}

    @classmethod
    def model_validate(cls, data: dict[str, Any]) -> "PDF":
        """Create instance from a dictionary of serializable Python objects."""
        pages = []
        for page_b64 in data["pages"]:
            image_bytes = base64.b64decode(page_b64)
            pages.append(Image.open(BytesIO(image_bytes)))
        return cls(pages=pages)


@ai.task()
def transcribe_pdf(pdf: PDF) -> str:
    pass  # insert some AI logic here

Note:

Inference

The library exposes a boilerplate-free wrapper around the OpenAI and Anthropic APIs. Its syntax is inspired by Vercel's very elegant AI SDK.

In the simplest case, you might just want to feed some model a user prompt and (optionally) a system prompt, and have it return a string using generate_text:

@ai.task()
async def summarize(text: str) -> str:
    return await ai.generate_text(
        model="gpt-4o-mini",
        system="You are a helpful assistant.",
        prompt=f"""
            Please summarize the following text:

            {text}
            """,
        dedent=True,  # defaults to True, eliminating indents from all prompts using textwrap.dedent
        max_attempts=3,  # number of retries on failure, defaults to 3
    )

To phrase more complex requests, you may opt to pass the model a list of messages. The following uses the PDF model we defined earlier:

@ai.task()
async def transcribe_pdf(pdf: PDF) -> str:
    return await ai.generate_text(
        model="gpt-4o-mini",
        messages=[
            ai.message.system("You are a helpful assistant."),
            ai.message.user(
                "Please transcribe the following PDF to Markdown:",
                ai.message.image(pdf.pages[0]),
            ),
        ],
    )

If you pass a messages argument, an exception will be raised if you also pass a system or prompt argument.

To request a structured output from the model, you can use generate_object and pass a Pydantic model as the type argument.

class PDFMetadata(BaseModel):
    title: str | None
    author: str | None


@ai.task()
async def extract_pdf_metadata(pdf: PDF) -> PDFMetadata:
    return await ai.generate_object(
        model="gpt-4o",
        type=PDFMetadata,
        messages=[
            ai.message.system("You are a helpful assistant."),
            ai.message.user(
                "Extract metadata from the following article:",
                *[ai.message.image(page) for page in pdf.pages],
            ),
        ],
    )

Datasets

The library exposes an ORM-like API for developing evaluation datasets.

Simply subclass Dataset and provide the following:

A NAME attribute -- this will be used to namespace different versions of the dataset
Two type arguments, which will be used internally for typing and validating your dataset -- the first one is used for your data, the second is for your targets

Here's an example:

from pydantic import BaseModel
from agentlens import Dataset


class Invoice(BaseModel):
    text: str


class Targets(BaseModel):
    is_corrupted: bool  # True if the invoice data is corrupted, False otherwise
    total_cost: float | None  # The total cost of the invoice, or None if it's corrupted


class InvoiceDataset(Dataset[Invoice, Targets]):
    NAME = "invoices"


# save a dataset split -- targets can be attached now or later (you'll see how in the next section)
InvoiceDataset.save(
    split="train",
    data=[Invoice(...), Invoice(...)],
)

# load a dataset split
dataset = InvoiceDataset.load("train")

for data, targets in dataset:
    print(data, targets)

Evaluation

The evaluation API uses hooks to give you precise control over your agent's computation graph.

You can run evaluations either from a Jupyter cell or from the CLI.

First let's define a simple set of tasks, riffing off of the invoice data structure we defined in the Dataset section:

@ai.task()
async def process_invoice(invoice: Invoice) -> float | str:
    looks_fine = await check_integrity(invoice)

    if not looks_fine:
        return await generate_error_report(invoice)

    return await extract_total_cost(invoice)


@ai.task()
async def check_integrity(invoice: Invoice, model: str = "gpt-4o-mini") -> bool:
    return await ai.generate_object(
        model=model,
        type=bool,
        prompt=f"Return True if the invoice looks uncorrupted: {invoice.text}",
    )


@ai.task()
async def generate_error_report(invoice: Invoice) -> str:
    return await ai.generate_text(
        model="gpt-4o",
        prompt=f"Write an error report for this corrupted invoice: {invoice.text}",
    )


@ai.task()
async def extract_total_cost(invoice: Invoice, model: str = "gpt-4o") -> float:
    return await ai.generate_object(
        model=model,
        type=float,
        prompt=f"Extract the total cost from this invoice: {invoice.text}",
    )

The first thing we'll want to do is bootstrap targets for our InvoiceDataset. This is easy to do using the hooks system.

We will use hooks to:

Modify the check_integrity and extract_total_cost tasks to use the o1-preview model, which is the most expensive and capable model available
Tap into the execution of these functions to write the results to the dataset as target labels

dataset = InvoiceDataset()

@ai.hook(check_integrity, model="o1-preview")
def hook_check_integrity(input, output, row):
    row.builder["is_corrupted"] = output

@ai.hook(extract_total_cost, model="o1-preview")
def hook_extract_total_cost(input, output, row):
    row.builder["total_cost"] = output

ai.run(
    main=process_invoice,
    dataset=dataset,
    hooks=[hook_check_integrity, hook_extract_total_cost],
    strict=False,  # disable strict mode to allow for partial runs
)

Now that we have labels, we can evaluate the check_integrity and extract_total_cost tasks as they were originally defined.

We can also add a report function to the ai.eval() call to generate a Markdown report summarizing the results, which will be written to evals/runs/<run_id>/report.md. Here we will only write a simple report, but one common pattern is to deploy language models to inspect your agent's behavior and summarize their findings for you.

Again, hooks provide an expressive way to write scoring logic, by reading state from parent scope:

check_integrity_scores = []
extract_total_cost_scores = []

@ai.hook(check_integrity, model="o1-preview")
def hook_check_integrity(input, output, row):
    score = output == row.targets.is_corrupted
    check_integrity_scores.append(score)

@ai.hook(extract_total_cost, model="o1-preview")
def hook_extract_total_cost(input, output, row):
    score = output - row.targets.total_cost
    extract_total_cost_scores.append(score)

def report():
    return f"""
    check_integrity (% correct): {sum(check_integrity_scores) / len(check_integrity_scores)}
    extract_total_cost (avg. error): {sum(extract_total_cost_scores) / len(extract_total_cost_scores)}
    """

ai.run(
    dataset=dataset,
    main=process_invoice,
    hooks=[hook_check_integrity, hook_extract_total_cost],
    report=report,
)
```# agentlens

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.25

Nov 20, 2024

0.1.24

Nov 19, 2024

0.1.23

Nov 19, 2024

0.1.22

Nov 19, 2024

0.1.21

Nov 18, 2024

0.1.20

Nov 18, 2024

0.1.19

Nov 18, 2024

0.1.18

Nov 18, 2024

0.1.17

Nov 18, 2024

0.1.16

Nov 18, 2024

0.1.15

Nov 18, 2024

0.1.14

Nov 17, 2024

0.1.13

Nov 16, 2024

0.1.12

Nov 14, 2024

0.1.11

Nov 14, 2024

0.1.10

Nov 14, 2024

0.1.9

Nov 14, 2024

0.1.8

Nov 14, 2024

0.1.7

Nov 2, 2024

0.1.6

Oct 30, 2024

0.1.5

Oct 30, 2024

0.1.4

Oct 30, 2024

This version

0.1.3

Oct 29, 2024

0.1.2

Oct 29, 2024

0.1.1

Oct 29, 2024

0.1.0

Oct 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlens-0.1.3.tar.gz (14.9 kB view details)

Uploaded Oct 29, 2024 Source

Built Distribution

agentlens-0.1.3-py3-none-any.whl (18.4 kB view details)

Uploaded Oct 29, 2024 Python 3

File details

Details for the file agentlens-0.1.3.tar.gz.

File metadata

Download URL: agentlens-0.1.3.tar.gz
Upload date: Oct 29, 2024
Size: 14.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.0 Windows/11

File hashes

Hashes for agentlens-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b994d6d867c21d30a96ae127558fdb3559931ef2ac8f84fb94740178eec112bb`
MD5	`64d736dc428ab8d0d423520147a52739`
BLAKE2b-256	`805f46f787dff0cc5e2322b870b4dae7789ab16eb9d66cd109705128da447ee4`

See more details on using hashes here.

File details

Details for the file agentlens-0.1.3-py3-none-any.whl.

File metadata

Download URL: agentlens-0.1.3-py3-none-any.whl
Upload date: Oct 29, 2024
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.12.0 Windows/11

File hashes

Hashes for agentlens-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e137bc74cf242434f8a53e4b553a09fb18b74324e686d6cfb0451d0d4d59bfd5`
MD5	`a9002ac2fd633aa75869859ae6533e71`
BLAKE2b-256	`271e84b362ecb4ac0e247d334c609b028c4c5a6af4623c5f50c8091915980c83`