Helper modules for Vector Institute AI Engineering Agents Bootcamp implementations

These details have not been verified by PyPI

Project description

aieng-agents

A utility library for building AI agent applications with support for knowledge bases, code interpreter, web search, and observability. Built for the Vector Institute Agents Bootcamp by the AI Engineering team.

Features

🤖 Agent Tools

Code Interpreter - Execute Python code in isolated E2B sandboxes with file upload support
Gemini Grounding with Google Search - Web search capabilities with citation tracking
Weaviate Knowledge Base - Vector database integration for RAG applications
News Events - Fetch structured current events from Wikipedia

📊 Data Processing

PDF to Dataset - Convert PDF documents to HuggingFace datasets using multimodal OCR
Dataset Chunking - Token-aware text chunking for embedding models
Dataset Loading - Unified interface for loading datasets from multiple sources

🔧 Utilities

Async Client Manager - Lifecycle management for async clients (OpenAI, Weaviate)
Progress Tracking - Rich progress bars for async operations with rate limiting
Gradio Integration - Message format conversion between Gradio and OpenAI SDK
Langfuse Integration - OpenTelemetry-based observability and tracing
Environment Configuration - Type-safe environment variable management with Pydantic
Session Management - Persistent conversation sessions with SQLite backend

Installation

Using uv (recommended)

uv pip install aieng-agents

Using pip

pip install aieng-agents

Quick Start

Environment Setup

Create a .env file with your API keys:

# Required for most features
OPENAI_API_KEY=your_openai_key
# or
GEMINI_API_KEY=your_gemini_key

# For Weaviate knowledge base
WEAVIATE_API_KEY=your_weaviate_key
WEAVIATE_HTTP_HOST=your_instance.weaviate.cloud
WEAVIATE_GRPC_HOST=grpc-your_instance.weaviate.cloud

# For code interpreter (optional)
E2B_API_KEY=your_e2b_key

# For Langfuse observability (optional)
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx

# For embedding models (optional)
EMBEDDING_API_KEY=your_embedding_key
EMBEDDING_BASE_URL=https://your-embedding-service

Basic Usage Examples

Using Tools with OpenAI Agents SDK

from aieng.agents.tools import (
    CodeInterpreter,
    AsyncWeaviateKnowledgeBase,
    get_weaviate_async_client,
)
from aieng.agents import AsyncClientManager
import agents

# Initialize client manager
manager = AsyncClientManager()

# Create an agent with tools
agent = agents.Agent(
    name="Research Assistant",
    instructions="Help users with code and research questions.",
    tools=[
        agents.function_tool(manager.knowledgebase.search_knowledgebase),
        agents.function_tool(CodeInterpreter().run_code),
    ],
    model=agents.OpenAIChatCompletionsModel(
        model="gpt-4o",
        openai_client=manager.openai_client,
    ),
)

# Run the agent
response = await agents.Runner.run(
    agent,
    input="Search for information about transformers and create a visualization."
)

# Clean up
await manager.close()

Using the Code Interpreter

from aieng.agents.tools import CodeInterpreter

interpreter = CodeInterpreter(
    template="<your template ID",
    timeout=300,
)

result = await interpreter.run_code(
    code="""
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title("Sine Wave")
plt.savefig("sine_wave.png")
""",
    files=[]
)

print(result.stdout)
print(result.results)  # Contains base64 PNG data

Fetching News Events

from aieng.agents.tools import get_news_events

news_events = await get_news_events()

# Access events by category
for category, events in news_events.root.items():
    print(f"\n{category}:")
    for event in events:
        print(f"  - {event.description}")

Using Gemini Grounding with Google Search

from aieng.agents.tools import GeminiGroundingWithGoogleSearch

search_tool = GeminiGroundingWithGoogleSearch(
    base_url="https://your-search-proxy",
    api_key="your_api_key"
)

response = await search_tool.search(
    query="Latest developments in transformer architecture"
)

print(response.text_with_citations)
print(f"Citations: {response.citations}")

Knowledge Base Search

from aieng.agents import AsyncClientManager

manager = AsyncClientManager()

results = await manager.knowledgebase.search_knowledgebase(
    keyword="machine learning"
)

for result in results:
    print(f"Title: {result.source.title}")
    print(f"Section: {result.source.section}")
    print(f"Snippet: {result.highlight.text[0][:200]}...")
    print()

await manager.close()

Langfuse Tracing

from aieng.agents import setup_langfuse_tracer, set_up_logging
from dotenv import load_dotenv

load_dotenv()
set_up_logging()

# Setup tracing
tracer = setup_langfuse_tracer(service_name="my_agent_app")

# Your agent code here - traces will automatically be sent to Langfuse

Async Operations with Progress

from aieng.agents import gather_with_progress, rate_limited
import asyncio

async def fetch_data(url):
    # Your async operation
    await asyncio.sleep(1)
    return f"Data from {url}"

# Run with progress bar
urls = ["url1", "url2", "url3"]
semaphore = asyncio.Semaphore(2)  # Max 2 concurrent

tasks = [
    rate_limited(lambda u=url: fetch_data(u), semaphore=semaphore)
    for url in urls
]

results = await gather_with_progress(
    tasks,
    description="Fetching data..."
)

Command-Line Tools

The package includes console scripts for data processing:

Convert PDFs to HuggingFace Dataset

pdf_to_hf_dataset \
    --input-path ./documents \
    --output-dir ./dataset \
    --recursive \
    --model gemini-2.5-flash \
    --chunk-size 512

Key options:

--input-path: PDF file or directory
--output-dir: Where to save the dataset
--recursive: Scan directories recursively
--model: OCR model to use
--chunk-size: Max tokens per chunk
--structured-ocr: Use structured JSON output
--skip-toc-detection: Skip table of contents pages

Chunk Existing Dataset

chunk_hf_dataset \
    --hf_dataset_path_or_name my-org/my-dataset \
    --chunk_size 512 \
    --chunk_overlap 128 \
    --save_to_hub \
    --hub_repo_id my-org/chunked-dataset

Advanced Usage

Custom Client Configuration

from aieng.agents import Configs, AsyncClientManager

# Load custom configuration
configs = Configs(
    default_planner_model="gpt-4o",
    default_worker_model="gpt-4o-mini",
    weaviate_collection_name="my_collection",
)

manager = AsyncClientManager(configs=configs)

Gradio Integration

from aieng.agents import (
    gradio_messages_to_oai_chat,
    oai_agent_stream_to_gradio_messages,
)
import gradio as gr
import agents

def chat_fn(message, history):
    # Convert Gradio messages to OpenAI format
    oai_messages = gradio_messages_to_oai_chat(history)

    # Run agent and stream response
    async for gradio_msg in oai_agent_stream_to_gradio_messages(
        agent, message, session
    ):
        yield gradio_msg

with gr.Blocks() as demo:
    chatbot = gr.Chatbot(type="messages")
    chatbot.chat(chat_fn)

Session Persistence

from aieng.agents import get_or_create_agent_session

# In your Gradio handler
session = get_or_create_agent_session(history, session_state)

# Use the session with agents
response = await agents.Runner.run(agent, input=message, session=session)

Development

Running Tests

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run --env-file .env pytest tests/

Project Layout

This package is part of the Vector Institute AI Engineering Agents Bootcamp. It provides reusable utilities for the bootcamp's reference implementations.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE file for details.

Support

Documentation: See the Agents Bootcamp docs
Issues: Report bugs on GitHub Issues
Questions: Open a discussion on GitHub

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Apr 17, 2026

0.1.1

Mar 16, 2026

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aieng_agents-0.1.2.tar.gz (49.9 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aieng_agents-0.1.2-py3-none-any.whl (64.9 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file aieng_agents-0.1.2.tar.gz.

File metadata

Download URL: aieng_agents-0.1.2.tar.gz
Upload date: Apr 17, 2026
Size: 49.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aieng_agents-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`093543e0257b1180904d6449ca844ea8605c1e5aaf30e6432350fca7311a6f5b`
MD5	`33f1f3a5ac7838db62b38d4c13ef8ce8`
BLAKE2b-256	`d4d447b768ddc8cf89ba7c99ce0b87eb3c20fae55855d70f31e8cc7763934006`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aieng_agents-0.1.2.tar.gz:

Publisher: publish.yml on VectorInstitute/agent-bootcamp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aieng_agents-0.1.2.tar.gz
- Subject digest: 093543e0257b1180904d6449ca844ea8605c1e5aaf30e6432350fca7311a6f5b
- Sigstore transparency entry: 1328593298
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: VectorInstitute/agent-bootcamp@1a55cff5112d22497ce251fa07c2732536cf97f9
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/VectorInstitute
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a55cff5112d22497ce251fa07c2732536cf97f9
- Trigger Event: release

File details

Details for the file aieng_agents-0.1.2-py3-none-any.whl.

File metadata

Download URL: aieng_agents-0.1.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 64.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aieng_agents-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a3b2d0a857715fe2ce8722efbf43984bad7f1698f3e89f4ccc218cc537cc952`
MD5	`c6e419cec72de2b390935b362c281b3f`
BLAKE2b-256	`b6fb024e60dc64b5023cbd06dec8b714db8a9858115e3e2b21c3ae589260ffa1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aieng_agents-0.1.2-py3-none-any.whl:

Publisher: publish.yml on VectorInstitute/agent-bootcamp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aieng_agents-0.1.2-py3-none-any.whl
- Subject digest: 3a3b2d0a857715fe2ce8722efbf43984bad7f1698f3e89f4ccc218cc537cc952
- Sigstore transparency entry: 1328593302
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: VectorInstitute/agent-bootcamp@1a55cff5112d22497ce251fa07c2732536cf97f9
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/VectorInstitute
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a55cff5112d22497ce251fa07c2732536cf97f9
- Trigger Event: release

aieng-agents 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

aieng-agents

Features

🤖 Agent Tools

📊 Data Processing

🔧 Utilities

Installation

Using uv (recommended)

Using pip

Quick Start

Environment Setup

Basic Usage Examples

Using Tools with OpenAI Agents SDK

Using the Code Interpreter

Fetching News Events

Using Gemini Grounding with Google Search

Knowledge Base Search

Langfuse Tracing

Async Operations with Progress

Command-Line Tools

Convert PDFs to HuggingFace Dataset

Chunk Existing Dataset

Advanced Usage

Custom Client Configuration

Gradio Integration

Session Persistence

Development

Running Tests

Project Layout

Contributing

License

Support

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance