Skip to main content

AI-powered dataset augmentation tool using Braintrust proxy

Project description

AUGR - AI Dataset Augmentation Tool

AI-powered dataset augmentation tool using Braintrust proxy with structured outputs.

Features

  • 🤖 Structured AI Outputs: Uses OpenAI's beta.chat.completions.parse with Pydantic schemas
  • 🧠 Braintrust Integration: Works with Braintrust proxy for multiple AI providers
  • 🔄 Interactive Workflows: Guided dataset augmentation with iterative refinement
  • 📊 Schema-aware Generation: Automatically infers and respects dataset schemas
  • Modern Tooling: Built with uv for fast dependency management

Installation

Option 1: Install from PyPI (Coming Soon)

# Install globally
pip install augr

# Or with pipx (recommended for CLI tools)
pipx install augr

# Or with uv
uv tool install augr

# Then use anywhere
augr

Option 2: Install from GitHub

# Install latest version
pip install git+https://github.com/yourusername/augr.git

# Or with uv
uv tool install git+https://github.com/yourusername/augr.git

# Then use anywhere
augr

Option 3: Development Setup

For development or local installation:

git clone https://github.com/yourusername/augr.git
cd augr
uv pip install -e .

# Test the installation
python test_installation.py

# Use anywhere
augr

Usage

Environment Variables

Create a .env file with:

BRAINTRUST_API_KEY=your_braintrust_api_key_here
# Optional: BRAINTRUST_BASE_URL=https://api.braintrust.dev/v1/proxy

Running

The tool provides an interactive CLI with two main modes:

  1. Guided Dataset Augmentation: Interactive workflow with iterative refinement
  2. Direct JSON Upload: Upload pre-generated samples directly
uv run python run_augr.py

Development

Install with development dependencies:

uv pip install -e ".[dev]"

Run linting and formatting:

uv run black .
uv run ruff check .

Architecture

  • ai_client.py: Core AI interface with structured outputs
  • augmentation_service.py: Main service for dataset augmentation
  • cli.py: Interactive command-line interface
  • models.py: Pydantic models for data structures
  • braintrust_client.py: Braintrust API integration

API Example

from augr.ai_client import create_ai
from pydantic import BaseModel

class Response(BaseModel):
    message: str
    confidence: float

# Create AI client (reads BRAINTRUST_API_KEY from env)
ai = create_ai(model="gpt-4o", temperature=0.0)

# Generate structured output
result = await ai.gen_obj(
    schema=Response,
    messages=[{"role": "user", "content": "Hello!"}],
    thinking_enabled=True  # For reasoning models
)

print(result.message)  # Structured output

License

[Your License Here]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

augr-0.1.0.tar.gz (98.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

augr-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file augr-0.1.0.tar.gz.

File metadata

  • Download URL: augr-0.1.0.tar.gz
  • Upload date:
  • Size: 98.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for augr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0d7287adb81095b8f7ac777fd1e9a3276833bf9d9343970180888214c8fd4128
MD5 3e0c10063568683a5809b1cb6a212678
BLAKE2b-256 a0af7ecb7c56b9c1036b70a07c23a546318c17a29e42ceb122980732a3eeca23

See more details on using hashes here.

File details

Details for the file augr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: augr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for augr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 948ceb3f2e9b9f353de3fae0e0b4a2b150554d1eafc64ccb1c52b833bb7d06ee
MD5 6c52a6c51914c6a19e3e7ec8f117d856
BLAKE2b-256 fc989aba4c28786c576807c2c33013eb43190d1260056f3777c6215d1e6208ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page