AI-powered dataset augmentation tool using Braintrust proxy
Project description
AUGR - AI Dataset Augmentation Tool
AI-powered dataset augmentation tool using Braintrust proxy with structured outputs.
Features
- 🤖 Structured AI Outputs: Uses OpenAI's
beta.chat.completions.parsewith Pydantic schemas - 🧠 Braintrust Integration: Works with Braintrust proxy for multiple AI providers
- 🔄 Interactive Workflows: Guided dataset augmentation with iterative refinement
- 📊 Schema-aware Generation: Automatically infers and respects dataset schemas
- ⚡ Modern Tooling: Built with
uvfor fast dependency management
Installation
Option 1: Install from PyPI (Coming Soon)
# Install globally
pip install augr
# Or with pipx (recommended for CLI tools)
pipx install augr
# Or with uv
uv tool install augr
# Then use anywhere
augr
Option 2: Install from GitHub
# Install latest version
pip install git+https://github.com/yourusername/augr.git
# Or with uv
uv tool install git+https://github.com/yourusername/augr.git
# Then use anywhere
augr
Option 3: Development Setup
For development or local installation:
git clone https://github.com/yourusername/augr.git
cd augr
uv pip install -e .
# Test the installation
python test_installation.py
# Use anywhere
augr
Usage
Environment Variables
Create a .env file with:
BRAINTRUST_API_KEY=your_braintrust_api_key_here
# Optional: BRAINTRUST_BASE_URL=https://api.braintrust.dev/v1/proxy
Running
The tool provides an interactive CLI with two main modes:
- Guided Dataset Augmentation: Interactive workflow with iterative refinement
- Direct JSON Upload: Upload pre-generated samples directly
uv run python run_augr.py
Development
Install with development dependencies:
uv pip install -e ".[dev]"
Run linting and formatting:
uv run black .
uv run ruff check .
Architecture
ai_client.py: Core AI interface with structured outputsaugmentation_service.py: Main service for dataset augmentationcli.py: Interactive command-line interfacemodels.py: Pydantic models for data structuresbraintrust_client.py: Braintrust API integration
API Example
from augr.ai_client import create_ai
from pydantic import BaseModel
class Response(BaseModel):
message: str
confidence: float
# Create AI client (reads BRAINTRUST_API_KEY from env)
ai = create_ai(model="gpt-4o", temperature=0.0)
# Generate structured output
result = await ai.gen_obj(
schema=Response,
messages=[{"role": "user", "content": "Hello!"}],
thinking_enabled=True # For reasoning models
)
print(result.message) # Structured output
License
[Your License Here]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file augr-0.1.0.tar.gz.
File metadata
- Download URL: augr-0.1.0.tar.gz
- Upload date:
- Size: 98.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d7287adb81095b8f7ac777fd1e9a3276833bf9d9343970180888214c8fd4128
|
|
| MD5 |
3e0c10063568683a5809b1cb6a212678
|
|
| BLAKE2b-256 |
a0af7ecb7c56b9c1036b70a07c23a546318c17a29e42ceb122980732a3eeca23
|
File details
Details for the file augr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: augr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
948ceb3f2e9b9f353de3fae0e0b4a2b150554d1eafc64ccb1c52b833bb7d06ee
|
|
| MD5 |
6c52a6c51914c6a19e3e7ec8f117d856
|
|
| BLAKE2b-256 |
fc989aba4c28786c576807c2c33013eb43190d1260056f3777c6215d1e6208ed
|