Core package for Open Source CAI Pipecat - foundation voice
Project description
Foundation Voice SDK
1. Introduction
Welcome to the Foundation Voice SDK! This SDK provides the core functionalities for building voice-based conversational AI applications using the Pipecat framework. It allows you to configure and run sophisticated AI agents capable of:
- Voice Activity Detection (VAD)
- Speech-to-Text (STT) conversion
- Language Model Processing (LLM)
- Text-to-Speech (TTS) synthesis
This guide will walk you through setting up the SDK, configuring your agent, creating custom server endpoints, and utilizing callbacks to build powerful conversational AI applications.
2. Installation
To install the SDK, ensure you have Python 3.8 or higher. You can install the package directly from the GitHub repository using pip:
# Always use a virtual environment for Python projects
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the SDK
pip install "git+ssh://git@github.com/think41/foundation-voice.git#egg=foundation_voice"
This command will install the foundation_voice package and all its dependencies as listed in pyproject.toml.
2.1 Environment Variables
Many services used by the SDK (like STT, LLM, TTS providers) require API keys. These should be set as environment variables. Create a .env file in your project's root directory (where you run your main application) with the necessary keys. Example:
# For OpenAI (LLM and TTS)
OPENAI_API_KEY="your_openai_api_key"
# For Deepgram (STT and TTS)
DEEPGRAM_API_KEY="your_deepgram_api_key"
# For Cartesia (TTS)
CARTESIA_API_KEY="your_cartesia_api_key"
# For Daily.co (Transport)
DAILY_API_KEY="your_daily_api_key"
DAILY_SAMPLE_ROOM_URL="your_daily_room_url_if_testing_daily"
#config path
CONFIG_PATH="path_to_your_config_file"
Refer to the specific provider's documentation for how to obtain API keys.
3. Basic Implementation Example
Here's a minimal example of implementing agent callbacks and tool configuration with a /connect endpoint in main.py:
import uvicorn
from fastapi import FastAPI, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from foundation_voice.utils.transport.connection_manager import WebRTCOffer
from foundation_voice.lib import CaiSDK
from agent_configure.utils.context import contexts
from agent_configure.utils.tool import tool_config
from agent_configure.utils.callbacks import custom_callbacks
from foundation_voice.utils.config_loader import ConfigLoader
from dotenv import load_dotenv
import os
load_dotenv()
# Initialize the CAI SDK
cai_sdk = CaiSDK()
config_path = os.getenv("CONFIG_PATH")
agent_config = ConfigLoader.load_config(config_path)
defined_agent = {
"agent": {
"config": agent_config,
"contexts": contexts,
"tool_dict": tool_config,
"callbacks": custom_callbacks,
},
}
# Create FastAPI app
app = FastAPI(
title="WebRTC Endpoint",
description="Basic WebRTC endpoint implementation",
version="1.0.0"
)
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.post("/connect") # rename this with your actual endpoint name
async def webrtc_endpoint(offer: WebRTCOffer, background_tasks: BackgroundTasks):
"""
Handle WebRTC offer and return answer
"""
agent = defined_agent.get("agent")
# Process the WebRTC offer and get response
response = await cai_sdk.webrtc_endpoint(offer, agent)
# Handle background tasks if any
if "background_task_args" in response:
task_args = response.pop("background_task_args")
func = task_args.pop("func")
background_tasks.add_task(func, **task_args)
return response["answer"]
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
3.1 Example Explanation
-
You can use the cai sdk which is a class which provides multiple methods to run the agent with the choice of transport.
-
Before running the agent, we need to create a json file with all the configartaions for the agent, then either put the path manual or load it from the environment variable.
-
The ConfigLoader class is a utility class that loads the agent configuration from a JSON file. It can be used to load the configuration from a file.
-
to use an agent we need to define it as a dictionary that contains the agent configuration, contexts, tools, and callbacks, below you will find examples and explanation of each.
-
create the FastAPI app:- the FastAPI app is a web server that handles the HTTP requests.
-
create a endpoint:- for connecting to the agent, the example shows the webrtc endpoint, you can choose your any transport of your choice.
Note: we currently support webrtc, websocket and daily transports, make sure your client is configured to use the same transport, as the client is responsible for handling the connection.
3.2 Core Concepts
The Foundation Voice SDK is built around several key concepts that work together to create a flexible and powerful conversational AI system:
1. Agent Configuration
Agent configuration defines the behavior, capabilities, and components of your AI agent. This includes:
- Voice Processing: VAD, STT, and TTS settings
- Language Model: Which LLM to use and how to configure it
- Prompts: Instructions that guide the agent's behavior
You can configure these through a JSON file. Jump to Configuration File
Note: agent configuration is optional and can be defined in the agent you define in your main application.
2. Tool Configuration
Tools are functions that extend your LLM capabilities, allowing it to perform actions beyond conversation. Tools can:
- Access external APIs or databases
- Modify application state
- Perform calculations or data processing
- Interact with other systems
Tools are registered in the tool_config Jump to Custom Tools
Note: tools are optional and can be defined in the agent you define in your main application.
3. Callbacks
Callbacks provide hooks into the agent's lifecycle, allowing your application to respond to events such as:
- Agent starting/ending
- Tool calls
- Transcript updates
- Client connections/disconnections
Callbacks enable integration with your application logic Jump to Callbacks
Note: callbacks are optional and can be defined in the agent you define in your main application. See Callbacks Guide for more details.
4. Context
Context is the mechanism that allows your LLM to maintain state, remember information across conversational turns, and personalize interactions. It's key for building more sophisticated and stateful conversational experiences. Context is typically defined as a data structure (often a Pydantic model) and configured in your agent's JSON settings. This allows the LLM to store and retrieve relevant information throughout a session.
For a detailed guide on how context is configured, defined in Python, and utilized during agent operations, see Context In-Depth section below.
Note: context is optional and can be defined in the agent you define in your main application.
4. Creating a Configuration File
The SDK uses a JSON file to configure the agent and the processing pipeline. You can create your own configuration file (e.g., my_agent_config.json).
Here's a breakdown of the main sections and options:
{
"agent": {
"title": "My Custom Bot",
"initial_greeting": "Hello! How can I help you today?",
"prompt": "You are a helpful assistant.",
"vad": {
"provider": "silero" // Supported: "silero", "none"
},
"stt": {
"provider": "deepgram",
"model": "nova-2"
},
"llm": {
"provider": "openai", // Supported: "openai", "openai_agents"
"model": "gpt-4o-mini", // For "openai" provider
"tools": ['get_current_weather'],
},
"tts": {
"provider": "openai", // Supported: "openai", "deepgram", "cartesia"
"voice": "alloy" // For "openai" provider
}
}
}
4.1 Configuration Options Explained
-
Agent Settings:
title: The name of your agentinitial_greeting: First message sent by the agentprompt: System prompt that defines the agent's behavior
-
Transport Configuration:
agent.transport.type: Choose how the client will connect (webrtc,websocket,daily)
-
Voice Activity Detection:
agent.vad.provider:silerois recommended. Usenoneto disable VAD.
-
Speech-to-Text:
agent.stt.provider: Exampledeepgram. Requires the corresponding API key in your.envfile.agent.stt.model: Model to use (provider-specific)
-
Language Model:
agent.llm.provider: Exampleopenai. RequiresOPENAI_API_KEYin your.envfile.agent.llm.model: Model name (e.g.,gpt-4o-mini)- For multi-agent setups, use
openai_agentsprovider with additional configuration
-
Text-to-Speech:
agent.tts.provider: Exampleopenai. Requires the corresponding API key.agent.tts.voice: Voice ID to use (provider-specific)
-
Pipeline Configuration:
- Defines the sequence of processing stages for your agent
- The
use_tools: truesetting in the LLM stage enables tool usage
4.2 Supported Configurations
Agent Settings
title: String (any text)initial_greeting: String (any text)prompt: String (system prompt text)
Transport Configuration
agent.transport.type:webrtc: For WebRTC connectionswebsocket: For WebSocket connectionsdaily: For Daily.co integration
Voice Activity Detection (VAD)
agent.vad.provider:silero: Recommended VAD providernone: Disable VAD
Speech-to-Text (STT)
agent.stt.provider:deepgram: Deepgram speech recognition- (Additional providers as supported)
agent.stt.model: Provider-specific model name (e.g.,nova-2for Deepgram)
Language Model (LLM)
agent.llm.provider:openai: Standard OpenAI modelsopenai_agents: For multi-agent setups
agent.llm.model: Model name (e.g.,gpt-4o-minifor OpenAI)agent.llm.temperature: Float (0.0 to 2.0)agent.llm.max_tokens: Integer
Text-to-Speech (TTS)
agent.tts.provider:openai: OpenAI's text-to-speechdeepgram: Deepgram's text-to-speechcartesia: Cartesia's text-to-speech
agent.tts.voice: Provider-specific voice ID (e.g.,alloy,novafor OpenAI)
4.3 Example Configurations:-
1. Basic Example with OpenAI
This is a simple configuration using OpenAI for both LLM and TTS:
{
"agent": {
"title": "Customer Support Bot",
"initial_greeting": "Hello! I'm your customer support assistant. How can I help you today?",
"prompt": "You are a helpful customer support assistant for a software company. Answer questions about our products, pricing, and provide technical support. Be friendly, concise, and professional.",
"transport": {
"type": "webrtc"
},
"vad": {
"provider": "silero"
},
"stt": {
"provider": "deepgram",
"model": "nova-2"
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 150,
},
"tts": {
"provider": "openai",
"voice": "nova"
}
}
}
Explanation:
-
Agent Configuration:
title: Names the bot "Customer Support Bot" - this will be displayed to usersinitial_greeting: The first message users will hear when connectingprompt: System instructions that define the bot's personality and capabilities
-
Transport:
- Uses WebRTC for real-time audio communication (requires browser support)
-
Voice Processing:
vad: Uses Silero for voice activity detection to determine when the user is speakingstt: Configures Deepgram with the nova-2 model for speech-to-text conversiontts: Uses OpenAI's nova voice for text-to-speech synthesis
-
Language Model:
provider: Uses OpenAI's APImodel: Specifies gpt-4o-mini as the modeltemperature: Sets creativity level to 0.7 (moderate creativity)max_tokens: Limits responses to 150 tokens for concise answers
-
Pipeline:
- Defines a linear processing flow from input to output
use_tools: truein the LLM stage enables the bot to use custom tool functions
2. Complex Example with OpenAI Agents
This example demonstrates a multi-agent system using openai_agents provider, similar to the MagicalNest Bot configuration:
{
"agent": {
"title": "E-Commerce Assistant",
"initial_greeting": "Welcome to our online store! I'm here to help you find the perfect products.",
"prompt": "You are a helpful and informative assistant for an e-commerce platform. Please provide accurate and concise responses, and you have access to tools.",
"transport": {
"type": "daily-webrtc"
},
"vad": {
"provider": "silero"
},
"stt": {
"provider": "deepgram"
},
"llm": {
"provider": "openai_agents",
"triage": false,
"agent_config": {
"start_agent": "welcome_agent",
"context": "ShoppingContext",
"logfire_trace": true,
"agents": {
"welcome_agent": {
"name": "welcome_agent",
"instructions": "You are the Welcome Agent for an e-commerce platform.\nIMPORTANT: Keep all responses SHORT, CRISP and CONVERSATIONAL - suitable for a voice interface.\nLimit responses to 1-2 short sentences whenever possible.\nYour responsibilities:\n- Warmly welcome customers to the store\n- Ask about what type of products they're looking for\n- Collect basic information about their preferences\n\nBe friendly and professional.\nAsk only ONE question at a time.\n\nWhen you've collected basic information, hand off to the Product Agent.",
"handoff_description": "When the conversation starts or needs to return to introduction",
"handoffs": ["product_agent"],
"tools": ["update_customer_info"],
"context": "ShoppingContext"
},
"product_agent": {
"name": "product_agent",
"instructions": "You are the Product Agent for an e-commerce platform.\nIMPORTANT: Keep all responses SHORT, CRISP and CONVERSATIONAL - suitable for a voice interface.\nLimit responses to 1-2 short sentences whenever possible.\n\nYour responsibilities:\n- Recommend products based on customer preferences\n- Provide details about products (features, pricing, availability)\n- Help customers compare different options\n- Add items to the shopping cart\n\nBe knowledgeable and helpful. Focus on suggesting products that match the customer's needs.\nAsk only ONE question at a time.\n\nIf the customer wants to check out, hand off to the Checkout Agent.\nIf the customer wants to start over, hand off to the Welcome Agent.",
"handoff_description": "When product information is needed or items need to be added to cart",
"handoffs": ["welcome_agent", "checkout_agent"],
"tools": ["search_products", "add_to_cart", "get_product_details"],
"context": "ShoppingContext"
},
"checkout_agent": {
"name": "checkout_agent",
"instructions": "You are the Checkout Agent for an e-commerce platform.\nIMPORTANT: Keep all responses SHORT, CRISP and CONVERSATIONAL - suitable for a voice interface.\nLimit responses to 1-2 short sentences whenever possible.\n\nYour responsibilities:\n- Guide customers through the checkout process\n- Handle payment method selection\n- Process shipping information\n- Confirm orders and provide order summaries\n\nBe efficient and reassuring. Make the checkout process as smooth as possible.\nAsk only ONE question at a time.\n\nIf the customer wants to continue shopping, hand off to the Product Agent.\nIf the customer wants to start over, hand off to the Welcome Agent.",
"handoff_description": "When the customer is ready to checkout",
"handoffs": ["welcome_agent", "product_agent"],
"tools": ["process_payment", "update_shipping", "complete_order"],
"context": "ShoppingContext"
}
}
}
},
"tts": {
"provider": "openai",
"voice": "shimmer"
},
"mcp": {
"type": "llm_orchestrator",
"config": {
"max_turns": 5,
"response_timeout": 10,
"history_length": 5,
"enable_tool_selection": true,
"tool_selection_strategy": "highest_confidence"
}
}
}
}
Explanation:
-
Multi-Agent Architecture:
- This configuration uses the
openai_agentsprovider to create a system with three specialized agents - Each agent has a specific role in the customer journey
- The system maintains a shared context (
ShoppingContext) across all agents
- This configuration uses the
-
Agent Configuration:
start_agent: Specifies that conversations begin with the welcome_agent- Each agent has:
- name: The unique identifier for the agent
- Instructions: Detailed prompts that define behavior and responsibilities
- Handoffs: List of other agents this agent can transfer control to
- handoff_description: Description of when the handoff should occur
- Tools: Specific functions this agent can access
- Context: Shared data structure for maintaining state
-
Agent Specialization:
- welcome_agent: Handles initial greetings and collects basic preferences
- product_agent: Manages product recommendations and cart operations
- checkout_agent: Processes payments and finalizes orders
-
Transport and Voice Processing:
- Uses Daily's WebRTC implementation for high-quality audio
- Silero for VAD and Deepgram for STT (without specifying model, will use default)
- OpenAI's shimmer voice for TTS
-
MCP (Model Context Protocol) Configuration:
- Type:
llm_orchestratormanages the flow between agents - Settings control conversation flow:
max_turns: Limits conversation lengthresponse_timeout: Sets maximum wait time for responseshistory_length: Controls how much conversation history is maintainedtool_selection: Enables and configures how tools are selected
- Type:
Note: After creating the
agent_config.jsonfile, make sure to load it in your app by setting theAGENT_CONFIG_PATHenvironment variable or by loading it manually.
5. Using Custom Tools With Openai Agents
Custom tools allow your OpenAI agent to perform specific actions and maintain state during conversations. Here's how to define and use custom tools with OpenAI agents in your application:
Creating Custom Tools
To create a new tool, use the @function_tool decorator:
from agents import function_tool, RunContextWrapper
@function_tool(
description_override='Your tool description here.'
)
def your_tool_name(ctx: RunContextWrapper, param1: type = None, param2: type = None):
# Your tool logic here
return "Result message"
# Don't forget to add it to tool_config
tool_config["your_tool_name"] = your_tool_name
Example Usage in Agent Configuration
In your agent_config.json, you can reference these tools in the tools section:
{
"agent": {
"tools": [
"update_basic_info",
"update_room_data",
"update_products",
"search_tool"
]
}
}
Note: Ensure you import your defined tool in the main.py file and pass it to your agent definition like shown in the example here
6.Callbacks Guide
The SDK supports multiple transport methods for communication between clients and your agent. Each transport type supports specific callbacks.
6.1 Available Callbacks and Their Parameters
All callbacks are asynchronous and should be defined with async def. The following callbacks are available in the AgentCallbacks class:
-
on_client_connected(client: Dict[str, Any]) -> None
- Supported Transports: WebSocket, WebRTC, Daily.co
- Triggered when a client connects to the agent
client: Dictionary containing client connection details (format varies by transport)
-
on_client_disconnected(data: Dict[str, Any]) -> None
- Supported Transports: WebSocket, WebRTC, Daily.co
- Triggered when a client disconnects
data: Dictionary containing:transcript: List[Dict] - Complete conversation transcriptmetrics: Dict - Call metrics and statistics including duration, latency, etc.
-
on_first_participant_joined(participant: Dict[str, Any]) -> None
- Supported Transport: Daily.co only
- Triggered when the first participant joins a Daily.co room
participant: Dictionary containing participant details (Daily.co participant object)
-
on_participant_left(participant: Dict[str, Any], reason: str) -> None
- Supported Transport: Daily.co only
- Triggered when a participant leaves a Daily.co room
participant: Dictionary containing participant detailsreason: String describing why the participant left
-
on_transcript_update(frame: Any) -> None
- Supported Transports: WebSocket, WebRTC, Daily.co
- Triggered when there's an update to the conversation transcript
frame: Object containing:messages: List[Dict] - Message objects with:role: str - 'user' or 'assistant'content: str - Message contenttimestamp: str - ISO 8601 timestamp
-
on_session_timeout() -> None
- Status: Defined but not implemented
- Note: This event is defined in the
AgentEventenum but is not currently used in the agent implementation
6.2 Notes
- All callbacks are optional.
- Callbacks are called asynchronously, so they should be defined with
async def. - The
dataparameter inon_client_disconnectedcontains both the transcript and call metrics. - For Daily.co transport,
on_first_participant_joinedis called only for the first participant. - The
on_participant_leftcallback is only available for Daily.co transport.
6.3 Implementing Custom Callbacks
Create a custom callbacks class by extending AgentCallbacks:
from foundation_voice.agent_configure.utils.callbacks import AgentCallbacks
class MyCallbacks(AgentCallbacks):
async def on_client_connected(self, client):
print(f"Client connected: {client}")
async def on_transcript_update(self, frame):
# Process new transcript data
for message in frame.messages:
print(f"Transcript: {message.role}: {message.content}")
async def on_client_disconnected(self, data):
transcript = data.get("transcript", [])
metrics = data.get("metrics", {})
print(f"Client disconnected. Transcript length: {len(transcript)}")
async def on_first_participant_joined(self, participant):
print(f"First participant joined: {participant}")
async def on_participant_left(self, participant, reason):
print(f"Participant left: {participant}, reason: {reason}")
print(f"Participant left: {participant}, reason: {reason}")
7. Context In-Depth
Context allows your agent to maintain state, remember information across conversational turns, and tailor its responses. It's a crucial component for building more sophisticated and personalized interactions. This section provides a detailed look into how context is managed within the Foundation Voice SDK.
7.1. Configuration (in JSON files like agent_config.json):
- Context is typically specified within your agent's configuration file (e.g.,
my_agent_config.jsonor the defaultagent_config.json). - It's defined using a string name that refers to a predefined context structure. For example:
// In your agent_config.json "agent_config": { "start_agent": "your_start_agent_name", "context": "MyCustomContextName", // Global context for the agent setup // ... other agent_config settings }
- If you're using a multi-agent setup, context can also be specified for individual sub-agents.
7.2. Explanation :
- The string names used in the JSON configuration (e.g.,
"MyCustomContextName") should be mapped to a Python class that define the actual structure of the context. - These classes are usually Pydantic
BaseModels. You can define your own custom context structures by creating new classes and importing it in your server endpoint# Example in foundation-voice/agent_configure/utils/context.py from pydantic import BaseModel from typing import Optional, List, Dict, Any # Ensure necessary imports class MyCustomContextName(BaseModel): user_id: Optional[str] = None session_data: Dict[str, Any] = {} conversation_history: List[Dict] = [] # ... other fields relevant to your application # ... ensure it's added to the 'contexts' dictionary # This dictionary is typically located in the same file (context.py) contexts = { "MyCustomContextName": MyCustomContextName, # ... other predefined contexts like MagicalNestContext }
Note ensure you import your defined context in the main.py file and pass it to your agent definition like shown in the example here
8. SIP Integration (via Webhook)
This SDK supports SIP integration through a webhook-based model, similar to how services like Twilio work. Instead of handling the SIP protocol directly, the application exposes endpoints that a SIP provider can call. The provider manages the SIP trunk and bridges the call to a WebSocket stream that the agent connects to.
This approach is robust, scalable, and avoids the complexity of a native SIP implementation.
8.1 Configuration for SIP
To enable SIP integration (using Twilio as the provider in this example), add the following to your .env file:
# Twilio credentials for SIP integration
TWILIO_ACCOUNT_SID="your_twilio_account_sid"
TWILIO_AUTH_TOKEN="your_twilio_auth_token"
TWILIO_PHONE_NUMBER="your_twilio_phone_number"
You will also need a publicly accessible URL for your running application (e.g., using a service like ngrok).
8.2 Handling Inbound Calls
- Configure Your SIP Provider: In your Twilio phone number's configuration, set the "A CALL COMES IN" webhook to point to your server's
/api/sipendpoint (e.g.,https://your-public-url.com/api/sip). - How It Works: When a call comes in, Twilio sends a POST request to
/api/sip. The application responds with TwiML (Twilio Markup Language) that instructs Twilio to open a WebSocket connection to the/wsendpoint of your agent. The agent then communicates over this stream.
8.3 Initiating Outbound Calls
You can trigger an outbound call by sending a POST request to the /api/sip/create-call endpoint.
Example using cURL:
curl -X POST "http://localhost:8000/api/sip/create-call?to_number=+1234567890"
to_number: The E.164 formatted phone number to call.- You can also optionally provide
from_numberandagent_nameas query parameters. Iffrom_numberis not provided, it will use theTWILIO_PHONE_NUMBERfrom your.envfile.
9. Advanced Topics
9.1 Advanced Tool Definitions
Tools can perform complex operations and integrate with external systems:
from foundation_voice.agent_configure.utils.tool import function_tool
import requests
@function_tool
def search_knowledge_base(query: str, max_results: int = 5) -> Dict[str, any]:
"""Search the knowledge base for information related to the query."""
# Call an external API
response = requests.get(
"https://your-api.example.com/search",
params={"q": query, "limit": max_results}
)
# Process and return the results
if response.status_code == 200:
return response.json()
else:
return {"error": f"Search failed with status {response.status_code}"}
9.2 Creating Server Endpoints
Integrate CaiSDK into your FastAPI application with custom endpoints:
from fastapi import FastAPI, WebSocket, BackgroundTasks
from foundation_voice.lib import CaiSDK, TransportType
app = FastAPI()
cai_sdk = CaiSDK()
# WebSocket endpoint
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await cai_sdk.websocket_endpoint(websocket)
# WebRTC endpoint
@app.post("/webrtc")
async def webrtc_endpoint(request_data: dict, background_tasks: BackgroundTasks):
return await cai_sdk.webrtc_endpoint(
background_tasks=background_tasks,
request_data=request_data
)
# Custom connect endpoint with additional logic
@app.post("/connect")
async def connect_endpoint(request_data: dict, background_tasks: BackgroundTasks):
# Add custom pre-processing here
user_id = request_data.get("user_id")
# Initialize context with user data
context = MyCustomContext(user_id=user_id)
# Set up custom callbacks
callbacks = MyCallbacks()
# Register tools specific to this user
tool_config["get_user_data"] = lambda: get_user_data(user_id)
return await cai_sdk.connect_handler(
background_tasks=background_tasks,
request_data=request_data,
tool_config=tool_config,
app_callbacks=callbacks,
context=context
)
10. Project Structure
A typical project using the Foundation Voice SDK follows this structure, as shown in the examples directory:
my-app/
├── main.py # Main FastAPI application with WebSocket and HTTP endpoints
├── main.py # Your application's entry point
├── agent_config.json # Example agent configuration for your app
├── app_callbacks.py # Your application's custom callbacks
├── app_context.py # Your application's custom context definitions
├── app_tools.py # Your application's custom tools
├── .env # Environment variables
└── requirements.txt # Python dependencies (will include foundation_voice)
This structure provides a clean separation of concerns, making it easy to maintain and extend your AI agent implementation. The configuration files allow for different agent behaviors, and the utils directory contains reusable components for callbacks, context management, and tools.
🛠️ Configuration
- Update the
.envfile with your API keys and configuration:
OPENAI_API_KEY=your_openai_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
DAILY_API_KEY=your_daily_api_key
DAILY_SAMPLE_ROOM_URL=your_daily_room_url
🚀 Running the Example
- Navigate to the examples directory:
cd examples
- Run the example server:
uvicorn main:app --reload
- Open your browser and navigate to
http://localhost:8000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foundation_voice-0.1.1.tar.gz.
File metadata
- Download URL: foundation_voice-0.1.1.tar.gz
- Upload date:
- Size: 66.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6419b2a842211b0d8d7c18da5b2e5f195c0b5f5b1e67ba2cf789493a94d5cf4
|
|
| MD5 |
0eb03d3351f3eefeef80e877e36f31c6
|
|
| BLAKE2b-256 |
32ca2dafed19d8d5b6eaf1c553798b3b25adc92984e21289ff8d7ff87953dcfe
|
File details
Details for the file foundation_voice-0.1.1-py3-none-any.whl.
File metadata
- Download URL: foundation_voice-0.1.1-py3-none-any.whl
- Upload date:
- Size: 61.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8913f68dfd64d070f9b9629ae5336ebad68cbede5e270064d8e990042ae598b2
|
|
| MD5 |
acae36646d5e75bce2eff9d0c7b0bdfa
|
|
| BLAKE2b-256 |
22ec666fde48e3a315c2c590cfb2508afe0c09e3380931769cbbf34f02725487
|