Skip to main content

AI Agent SDK — observability, memory, and analytics for LLM applications. Provider-agnostic. Tracks token usage, tool calls, conversations, and enables shared team memory.

Project description

Pentatonic

AI Agent SDK

Observability, memory, and analytics for LLM applications.
Run locally or use hosted TES. JavaScript & Python.

npm PyPI License


Table of Contents

Overview

Two ways to use the SDK:

Local Memory -- Run a fully private memory system on your own machine. PostgreSQL + pgvector + Ollama in Docker. No API keys, no cloud. Your agent gets persistent, searchable memory backed by multi-signal retrieval and HyDE query expansion.

Hosted TES -- Connect to Pentatonic's Thing Event System for production-grade observability, higher-dimensional embeddings, conversation analytics, and team-wide shared memory.

Both paths use the same Claude Code plugin. The hooks auto-search on every prompt and auto-store every conversation turn.

Local Memory (self-hosted)

Run the full memory stack locally. Requires Docker and ~4GB disk for models.

1. Set up

npx @pentatonic-ai/ai-agent-sdk memory

This starts PostgreSQL + pgvector, Ollama, and the memory server. It pulls embedding and chat models, and writes the local config.

2. Install the Claude Code plugin

/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
/plugin install tes-memory@pentatonic-ai

That's it. The plugin hooks automatically search memories on every prompt and store every conversation turn. Fully local, fully private.

What you get

  • Automatic memory -- every conversation turn is stored with embeddings and HyDE query expansion
  • Semantic search -- multi-signal retrieval combining vector similarity, BM25 full-text, recency decay, and access frequency
  • Memory layers -- episodic (recent), semantic (consolidated), procedural (how-to), working (temporary)
  • Decay and consolidation -- memories fade over time; frequently accessed ones get promoted

Change models

EMBEDDING_MODEL=mxbai-embed-large LLM_MODEL=qwen2.5:7b npx @pentatonic-ai/ai-agent-sdk memory

Raspberry Pi

Pi 5 with 8GB RAM runs the full stack. nomic-embed-text (~300MB) + llama3.2:3b (~2GB) leaves plenty of headroom.

Use as a library

import { createMemorySystem } from '@pentatonic-ai/ai-agent-sdk/memory';

const memory = createMemorySystem({
  db: pgPool,
  embedding: { url: 'http://localhost:11434/v1', model: 'nomic-embed-text' },
  llm: { url: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
});

await memory.migrate();
await memory.ensureLayers('my-app');
await memory.ingest('User prefers dark mode', { clientId: 'my-app' });
const results = await memory.search('preferences', { clientId: 'my-app' });

Hosted TES

Connect to Pentatonic's hosted infrastructure for production use.

1. Create an account

npx @pentatonic-ai/ai-agent-sdk init

This walks you through account creation, email verification, and API key generation. You'll get:

TES_ENDPOINT=https://your-company.api.pentatonic.com
TES_CLIENT_ID=your-company
TES_API_KEY=tes_your-company_xxxxx

2. Install

npm install @pentatonic-ai/ai-agent-sdk
pip install pentatonic-ai-agent-sdk

What you get (in addition to local features)

  • Higher-dimensional embeddings -- NV-Embed-v2 (4096d) for better retrieval accuracy
  • Conversation analytics -- session metrics, search attribution, dead-end detection
  • Team-wide shared memory -- semantic search across your team's AI interactions
  • Admin dashboard -- visualize conversations, token usage, and memory explorer
  • Multi-tenancy -- isolated databases per client

Claude Code Plugin

Works with both local and hosted setups. Install once, switch modes via config.

Install via marketplace

/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
/plugin install tes-memory@pentatonic-ai

Set up

For hosted TES:

/tes-memory:tes-setup

For local memory:

npx @pentatonic-ai/ai-agent-sdk memory

What it tracks

  • Every conversation turn -- user messages, assistant responses, tool calls, duration
  • Automatic memory search -- relevant memories injected as context on every prompt
  • Automatic memory storage -- every turn stored with embeddings and HyDE queries
  • Token usage -- input, output, cache read, cache creation tokens per turn

SDK: Wrap Your LLM Client

JavaScript

import { TESClient } from "@pentatonic-ai/ai-agent-sdk";

const tes = new TESClient({
  clientId: process.env.TES_CLIENT_ID,
  apiKey: process.env.TES_API_KEY,
  endpoint: process.env.TES_ENDPOINT,
});

const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });
const result = await ai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

Python

from pentatonic_agent_events import TESClient

tes = TESClient(
    client_id=os.environ["TES_CLIENT_ID"],
    api_key=os.environ["TES_API_KEY"],
    endpoint=os.environ["TES_ENDPOINT"],
)

ai = tes.wrap(OpenAI(), session_id="conv-123")
result = ai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

Supported Providers

Provider Detection Intercepted Method
OpenAI client.chat.completions.create chat.completions.create()
Anthropic client.messages.create messages.create()
Workers AI client.run (JS only) run()

All other methods pass through unchanged.

API Reference

TESClient(config)

Param Type Default Description
clientId string required Your tenant identifier
apiKey string required TES API key
endpoint string required TES instance URL
userId string null User identifier for attribution
captureContent boolean true Include message content in events
maxContentLength number 4096 Truncate content beyond this length

tes.wrap(client, opts?)

Returns an instrumented proxy. Every intercepted call emits a CHAT_TURN event.

Option Type Default Description
sessionId string auto-generated UUID Links events from the same conversation
metadata object {} Custom fields on every event

tes.session(opts?)

Returns a Session for manual event emission.

session.emitChatTurn({ userMessage, assistantResponse, turnNumber? })

Emits a CHAT_TURN event with accumulated data, then resets.

normalizeResponse(raw)

Standalone utility to normalize any LLM response:

import { normalizeResponse } from "@pentatonic-ai/ai-agent-sdk";

const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);

Architecture

                    +-----------------------+
                    |   Claude Code Plugin  |
                    |   (hooks: auto-search |
                    |    + auto-store)      |
                    +-----------+-----------+
                                |
                    +-----------+-----------+
                    |                       |
              Local Memory            Hosted TES
              (Docker)                (Cloud)
                    |                       |
         +----+----+----+          +---+----+---+
         |    |    |    |          |   |    |   |
        PG  Ollama MCP HTTP      PG  R2  Queue Workers
        pgvector        API     pgvector       Modules

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pentatonic_ai_agent_sdk-0.4.0b1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pentatonic_ai_agent_sdk-0.4.0b1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file pentatonic_ai_agent_sdk-0.4.0b1.tar.gz.

File metadata

File hashes

Hashes for pentatonic_ai_agent_sdk-0.4.0b1.tar.gz
Algorithm Hash digest
SHA256 e54ddff5f404c1b2c6e39e38b925cb31cecca2d0fe28266b38d8d2e4d32fc098
MD5 55919978fc065305802b5996e8749214
BLAKE2b-256 4db3e674845f08019543d952caa2e42f3575ff353b1afde4d36e70f4b9eccf02

See more details on using hashes here.

File details

Details for the file pentatonic_ai_agent_sdk-0.4.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for pentatonic_ai_agent_sdk-0.4.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 2696ab6a4a7529016dc3176890c167372f20463a0dcd7eb7fc7f7a33c48ecab7
MD5 f75b53a5059b39b047ec8b107b8fafb3
BLAKE2b-256 9f79717ce5574a21936253b9ad1bcbc93991f262d21cbb13cea51b80bfe31afa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page