Helper modules for Vector Institute AI Engineering Agents Bootcamp implementations
Project description
aieng-agents
A utility library for building AI agent applications with support for knowledge bases, code interpreter, web search, and observability. Built for the Vector Institute Agents Bootcamp by the AI Engineering team.
Features
🤖 Agent Tools
- Code Interpreter - Execute Python code in isolated E2B sandboxes with file upload support
- Gemini Grounding with Google Search - Web search capabilities with citation tracking
- Weaviate Knowledge Base - Vector database integration for RAG applications
- News Events - Fetch structured current events from Wikipedia
📊 Data Processing
- PDF to Dataset - Convert PDF documents to HuggingFace datasets using multimodal OCR
- Dataset Chunking - Token-aware text chunking for embedding models
- Dataset Loading - Unified interface for loading datasets from multiple sources
🔧 Utilities
- Async Client Manager - Lifecycle management for async clients (OpenAI, Weaviate)
- Progress Tracking - Rich progress bars for async operations with rate limiting
- Gradio Integration - Message format conversion between Gradio and OpenAI SDK
- Langfuse Integration - OpenTelemetry-based observability and tracing
- Environment Configuration - Type-safe environment variable management with Pydantic
- Session Management - Persistent conversation sessions with SQLite backend
Installation
Using uv (recommended)
uv pip install aieng-agents
Using pip
pip install aieng-agents
Quick Start
Environment Setup
Create a .env file with your API keys:
# Required for most features
OPENAI_API_KEY=your_openai_key
# or
GEMINI_API_KEY=your_gemini_key
# For Weaviate knowledge base
WEAVIATE_API_KEY=your_weaviate_key
WEAVIATE_HTTP_HOST=your_instance.weaviate.cloud
WEAVIATE_GRPC_HOST=grpc-your_instance.weaviate.cloud
# For code interpreter (optional)
E2B_API_KEY=your_e2b_key
# For Langfuse observability (optional)
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx
# For embedding models (optional)
EMBEDDING_API_KEY=your_embedding_key
EMBEDDING_BASE_URL=https://your-embedding-service
Basic Usage Examples
Using Tools with OpenAI Agents SDK
from aieng.agents.tools import (
CodeInterpreter,
AsyncWeaviateKnowledgeBase,
get_weaviate_async_client,
)
from aieng.agents import AsyncClientManager
import agents
# Initialize client manager
manager = AsyncClientManager()
# Create an agent with tools
agent = agents.Agent(
name="Research Assistant",
instructions="Help users with code and research questions.",
tools=[
agents.function_tool(manager.knowledgebase.search_knowledgebase),
agents.function_tool(CodeInterpreter().run_code),
],
model=agents.OpenAIChatCompletionsModel(
model="gpt-4o",
openai_client=manager.openai_client,
),
)
# Run the agent
response = await agents.Runner.run(
agent,
input="Search for information about transformers and create a visualization."
)
# Clean up
await manager.close()
Using the Code Interpreter
from aieng.agents.tools import CodeInterpreter
interpreter = CodeInterpreter(
template="<your template ID",
timeout=300,
)
result = await interpreter.run_code(
code="""
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title("Sine Wave")
plt.savefig("sine_wave.png")
""",
files=[]
)
print(result.stdout)
print(result.results) # Contains base64 PNG data
Fetching News Events
from aieng.agents.tools import get_news_events
news_events = await get_news_events()
# Access events by category
for category, events in news_events.root.items():
print(f"\n{category}:")
for event in events:
print(f" - {event.description}")
Using Gemini Grounding with Google Search
from aieng.agents.tools import GeminiGroundingWithGoogleSearch
search_tool = GeminiGroundingWithGoogleSearch(
base_url="https://your-search-proxy",
api_key="your_api_key"
)
response = await search_tool.search(
query="Latest developments in transformer architecture"
)
print(response.text_with_citations)
print(f"Citations: {response.citations}")
Knowledge Base Search
from aieng.agents import AsyncClientManager
manager = AsyncClientManager()
results = await manager.knowledgebase.search_knowledgebase(
keyword="machine learning"
)
for result in results:
print(f"Title: {result.source.title}")
print(f"Section: {result.source.section}")
print(f"Snippet: {result.highlight.text[0][:200]}...")
print()
await manager.close()
Langfuse Tracing
from aieng.agents import setup_langfuse_tracer, set_up_logging
from dotenv import load_dotenv
load_dotenv()
set_up_logging()
# Setup tracing
tracer = setup_langfuse_tracer(service_name="my_agent_app")
# Your agent code here - traces will automatically be sent to Langfuse
Async Operations with Progress
from aieng.agents import gather_with_progress, rate_limited
import asyncio
async def fetch_data(url):
# Your async operation
await asyncio.sleep(1)
return f"Data from {url}"
# Run with progress bar
urls = ["url1", "url2", "url3"]
semaphore = asyncio.Semaphore(2) # Max 2 concurrent
tasks = [
rate_limited(lambda u=url: fetch_data(u), semaphore=semaphore)
for url in urls
]
results = await gather_with_progress(
tasks,
description="Fetching data..."
)
Command-Line Tools
The package includes console scripts for data processing:
Convert PDFs to HuggingFace Dataset
pdf_to_hf_dataset \
--input-path ./documents \
--output-dir ./dataset \
--recursive \
--model gemini-2.5-flash \
--chunk-size 512
Key options:
--input-path: PDF file or directory--output-dir: Where to save the dataset--recursive: Scan directories recursively--model: OCR model to use--chunk-size: Max tokens per chunk--structured-ocr: Use structured JSON output--skip-toc-detection: Skip table of contents pages
Chunk Existing Dataset
chunk_hf_dataset \
--hf_dataset_path_or_name my-org/my-dataset \
--chunk_size 512 \
--chunk_overlap 128 \
--save_to_hub \
--hub_repo_id my-org/chunked-dataset
Advanced Usage
Custom Client Configuration
from aieng.agents import Configs, AsyncClientManager
# Load custom configuration
configs = Configs(
default_planner_model="gpt-4o",
default_worker_model="gpt-4o-mini",
weaviate_collection_name="my_collection",
)
manager = AsyncClientManager(configs=configs)
Gradio Integration
from aieng.agents import (
gradio_messages_to_oai_chat,
oai_agent_stream_to_gradio_messages,
)
import gradio as gr
import agents
def chat_fn(message, history):
# Convert Gradio messages to OpenAI format
oai_messages = gradio_messages_to_oai_chat(history)
# Run agent and stream response
async for gradio_msg in oai_agent_stream_to_gradio_messages(
agent, message, session
):
yield gradio_msg
with gr.Blocks() as demo:
chatbot = gr.Chatbot(type="messages")
chatbot.chat(chat_fn)
Session Persistence
from aieng.agents import get_or_create_agent_session
# In your Gradio handler
session = get_or_create_agent_session(history, session_state)
# Use the session with agents
response = await agents.Runner.run(agent, input=message, session=session)
Development
Running Tests
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
uv run --env-file .env pytest tests/
Project Layout
This package is part of the Vector Institute AI Engineering Agents Bootcamp. It provides reusable utilities for the bootcamp's reference implementations.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE file for details.
Support
- Documentation: See the Agents Bootcamp docs
- Issues: Report bugs on GitHub Issues
- Questions: Open a discussion on GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aieng_agents-0.1.0.tar.gz.
File metadata
- Download URL: aieng_agents-0.1.0.tar.gz
- Upload date:
- Size: 49.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1e39a856db2442b1eee3cb00066ce0bfa93603a97f6d1f6dac978d94af435e2
|
|
| MD5 |
4f15db08278585ba1a50d26037f3a6b1
|
|
| BLAKE2b-256 |
9779569b41eab558c05dbeefffa199e233201c0cdc6b57f69235fbd9ea5da20d
|
File details
Details for the file aieng_agents-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aieng_agents-0.1.0-py3-none-any.whl
- Upload date:
- Size: 64.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67b4d46e8ae68836eccd1e3ee6959d76bfd50512617eb8022cc7be4fadd6c0a6
|
|
| MD5 |
22d2213e9eba164402f03a08dbc41d58
|
|
| BLAKE2b-256 |
fe730cda3160b7cbf2d5ee636f5f8cd28fecff78567d106fcbb260d51ad960f6
|