CUDAG - Computer Use Deterministic Augmented Generator framework for building VLM training data generators

These details have not been verified by PyPI

Project links

Project description

CUDAG - Computer Use Deterministic Augmented Generator

A Rails-like framework for building VLM (Vision-Language Model) training data generators.

Overview

CUDAG provides a convention-over-configuration approach to generating training data for computer use models. It uses a domain-specific MVC-like pattern:

Screen - Declarative UI definition (like Model in Rails)
State - Dynamic data for rendering
Renderer - Image generation (like View in Rails)
Task - Interaction logic (like Controller in Rails)
Model - Domain data types with generators (Patient, Provider, etc.)

Installation

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install CUDAG and dev dependencies
make install
make dev

Quality Checks

Always run quality checks during development:

make check      # Run all checks (lint, typecheck, complexity)
make lint       # Ruff linting and format checking
make typecheck  # Mypy strict type checking
make complexity # Radon cyclomatic complexity analysis
make format     # Auto-format code

Development Workflow

Building a CUDAG generator follows this process:

Step 1: Generate New App

# Install CUDAG globally
uvx pip install cudag

# Create a new generator project
cudag new claim-window-generator

# Navigate into the project
cd claim-window-generator

This creates:

claim-window-generator/
├── assets/               # Base images, fonts
├── config/
│   └── dataset.yaml
├── models/               # Domain model definitions
├── tasks/                # Task implementations
├── screen.py             # Screen definition
├── state.py              # State dataclass
├── renderer.py           # Image renderer
└── datasets/             # Output (gitignored)

Step 2: Add Base Images

Copy your blank screen images and fonts:

Full screen blank: assets/base.png - The base UI template
Region blanks: assets/grid_blank.png - Headers, overlays, etc.
Fonts: assets/fonts/font.ttf - Font for rendering text

Step 3: Generate Data Models

Use Claude to generate domain models for your data:

from cudag import Model, FirstName, LastName, DOB, NPI, Phone, Email
from cudag import string, date_field, money, choice, computed

class Patient(Model):
    first_name = FirstName()
    last_name = LastName()
    dob = DOB()
    member_id = string(pattern=r"[A-Z]{3}[0-9]{6}")
    phone = Phone()
    email = Email()

    # Computed fields
    full_name = computed("first_name", "last_name")
    age = years_since("dob")

class Procedure(Model):
    code = string(pattern=r"D[0-9]{4}")
    description = choice("Exam", "Cleaning", "X-Ray", "Crown")
    fee = money(min_value=50.0, max_value=2500.0)

class Provider(Model):
    first_name = string(faker="first_name")
    last_name = string(faker="last_name")
    npi = string(faker="npi")
    specialty = choice("General", "Orthodontics", "Oral Surgery")

Field Types:

string(faker=..., pattern=..., choices=...) - Text
integer(min_value, max_value) - Numbers
decimal(min_value, max_value, precision) - Floats
money(min_value, max_value) - Currency ($X.XX)
date_field(min_year, max_year, format) - Dates
time_field(min_hour, max_hour, format) - Times
boolean(probability) - True/False
choice(*options, weights) - Pick from list
computed(*sources) - Derived from other fields
years_since(field) - Age calculation

Step 4: Define Screen Layout

Declare your screen structure with regions:

from cudag import Screen, grid, button, scrollable, dropdown

class ClaimWindowScreen(Screen):
    name = "claim-window"
    base_image = "images/screen_blank.png"
    size = (1155, 853)

    # Grid region - bounds are (x, y, width, height)
    procedure_grid = grid(
        (0, 217, 1155, 167),
        rows=8,
        cols=17,
    )

    # Scrollable area
    scroll_area = scrollable(
        (0, 217, 1155, 167),
        step=300,
        direction="vertical",
    )

    # Buttons
    billing_provider = button((85, 95, 200, 20), label="Billing Provider")
    save_button = button((100, 800, 80, 30), label="Save")

Region Types:

region(bounds) - Simple clickable area
button(bounds, label, description) - Clickable button
grid(bounds, rows, cols) - Grid of cells
scrollable(bounds, step, direction) - Scrollable area
dropdown(bounds, items) - Dropdown menu

Step 5: Build Screen Renderer

Render your screen with PIL, drawing data onto the base image:

from PIL import Image, ImageDraw, ImageFont
from cudag import BaseRenderer
from .screens import ClaimWindowScreen
from .state import GridState

class ClaimWindowRenderer(BaseRenderer[GridState]):
    screen_class = ClaimWindowScreen

    def load_assets(self) -> None:
        self.font = ImageFont.truetype(
            str(self.asset_path("fonts", "font.ttf")), 9
        )

    def render(self, state: GridState) -> tuple[Image.Image, dict]:
        image = self.load_base_image()
        draw = ImageDraw.Draw(image)

        # Render grid rows
        self._render_grid(image, draw, state)

        # Render scrollbar
        self._render_scrollbar(image, state)

        metadata = self.build_metadata(state)
        return image, metadata

Step 6: Build Region Renderers

For complex regions (grids, tables), create dedicated rendering methods:

def _render_grid(self, image, draw, state):
    for idx, row in enumerate(state.visible_rows):
        y = GRID_Y_START + idx * ROW_HEIGHT
        for col in COLUMNS:
            value = getattr(row, col["id"], "")
            x = col["x"]
            draw.text((x, y), str(value), font=self.font, fill=(0, 0, 0))

def _render_scrollbar(self, image, state):
    # Calculate thumb position based on scroll state
    thumb_y = calculate_thumb_position(state)
    draw.rectangle([track_x, thumb_y, track_x + width, thumb_y + height],
                   fill=(100, 100, 100))

Step 7: Test and Align Data

This is critical - manually verify that:

Grid columns align with data
Text fits within column widths
Row wrapping works correctly
Scroll positions show correct content
All UI elements render properly

# Generate a small test batch
python -m my_generator.generator --config config/dataset.yaml

# View generated images
open datasets/my-dataset/images/

Step 8: Create Tasks

Define tasks that generate training samples:

from cudag import BaseTask, TaskSample, TaskContext, ToolCall

class ScrollGridTask(BaseTask):
    task_type = "scroll-grid"

    def generate_sample(self, ctx: TaskContext) -> TaskSample:
        # Generate state
        state = GridState.generate(ctx.rng, min_rows=15, max_rows=28)

        # Render image
        image, metadata = self.renderer.render(state)
        image_path = self.save_image(image, ctx)

        # Get scroll coordinates
        grid_center = self.renderer.get_grid_center()

        return TaskSample(
            id=self.build_id(ctx),
            image_path=image_path,
            human_prompt="Scroll down in the grid.",
            tool_call=ToolCall.scroll(grid_center, pixels=300),
            pixel_coords=grid_center,
            image_size=self.renderer.screen_class.meta().size,
            metadata={"task_type": self.task_type, **metadata},
        )

Step 9: Create Dataset Generator

Use run_generator() to handle boilerplate (argument parsing, config loading, dataset naming):

from pathlib import Path
from cudag import run_generator
from .renderer import ClaimWindowRenderer
from .tasks import ScrollGridTask

def main():
    renderer = ClaimWindowRenderer(assets_dir=Path("assets"))
    tasks = [ScrollGridTask(config={}, renderer=renderer)]
    run_generator(renderer, tasks)

if __name__ == "__main__":
    main()

The run_generator() helper handles:

Script invocation check
Argument parsing (--config, --seed)
Config loading from YAML
Dataset naming ({prefix}-{researcher}-{timestamp})
Building dataset and tests

For custom behavior, use optional parameters:

run_generator(
    renderer,
    tasks,
    extra_args=[("--debug", {"action": "store_true"})],
    config_modifier=lambda config, args: setattr(config, 'seed', 999) if args.debug else None,
    post_build=lambda output_dir, renderer: generate_debug_images(output_dir),
)

Step 10: Generate Production Dataset

# Generate full dataset
PYTHONPATH=src python -m my_generator.generator

# Verify output
ls datasets/my-dataset/
# images/  data.jsonl  train.jsonl  test.jsonl  config.json

# Check JSONL format
head -1 datasets/my-dataset/data.jsonl | python -m json.tool

Output Format

Generated JSONL structure:

{
  "id": "my-dataset_00000",
  "image": "images/my-dataset_00000.jpg",
  "conversations": [
    {"from": "system", "value": "...tool definitions..."},
    {"from": "human", "value": "<image>\nScroll down in the grid."},
    {"from": "gpt", "value": "<tool_call>{\"name\": \"computer_use\", \"arguments\": {\"action\": \"scroll\", \"coordinate\": [500, 352], \"pixels\": 300}}</tool_call>"}
  ],
  "metadata": {
    "task_type": "scroll-grid",
    "real_coords": [577, 300]
  }
}

Utility Functions

Researcher Name

Use get_researcher_name() to automatically include researcher identity in dataset names:

from cudag import get_researcher_name

# Reads from .researcher file (supports "Name: mike" or plain "mike")
# Falls back to USER environment variable
researcher = get_researcher_name()  # Returns "mike" or None

# Disable environment fallback
researcher = get_researcher_name(fallback_to_env=False)

Font Loading

Use load_font() for platform-aware font loading with automatic fallbacks:

from cudag import load_font, load_font_family

# Load with automatic system font fallback
font = load_font("assets/fonts/Inter.ttf", size=14)

# Load with explicit fallbacks
font = load_font(
    "assets/fonts/Inter.ttf",
    size=14,
    fallbacks=["/System/Library/Fonts/Helvetica.ttc"]
)

# Load font family with variants
fonts = load_font_family(
    "fonts/Inter-Regular.ttf",
    size=14,
    bold="fonts/Inter-Bold.ttf",
)
# fonts["regular"], fonts["bold"], fonts["italic"], fonts["bold_italic"]

Random Data Generation

Use choose(), date_in_range(), and amount() for consistent random data:

from random import Random
from cudag import choose, date_in_range, amount, weighted_choice

rng = Random(42)

# Choose random item from sequence
provider = choose(rng, ["Dr. Smith", "Dr. Jones", "Dr. Brown"])

# Generate random date in range
visit_date = date_in_range(rng, "2024-01-01", "2024-12-31", fmt="%m/%d/%Y")

# Generate random monetary amount
fee = amount(rng, 50.0, 500.0)
# With optional zero values (20% chance)
payment = amount(rng, 0.0, 100.0, allow_zero=True)

# Weighted random choice
status = weighted_choice(rng, {"pending": 0.7, "approved": 0.2, "denied": 0.1})

Text Utilities

Use text utilities for measurement and rendering:

from cudag import measure_text, center_text_position, draw_centered_text, wrap_text
from PIL import Image, ImageDraw, ImageFont

font = ImageFont.load_default()

# Measure text dimensions
width, height = measure_text("Hello World", font)

# Calculate centered position
tx, ty = center_text_position("Label", font, x=0, y=0, width=200, height=50)

# Draw centered text directly
img = Image.new("RGB", (200, 100), "white")
draw = ImageDraw.Draw(img)
draw_centered_text(draw, "Centered", font, x=0, y=0, width=200, height=100)

# Wrap text to fit width
lines = wrap_text("This is a long sentence that needs wrapping", max_width=100, font=font)

# Truncate text with ellipsis
short = truncate_text("This is a very long label", max_width=80, font=font)
# Returns "This is..." or similar

Config Utilities

Load YAML configuration files:

from cudag import load_yaml_config, get_config_path

# Get config path relative to your module
config_path = get_config_path(__file__, "canvas.yaml")

# Load YAML config
config = load_yaml_config(config_path)
# Returns dict with parsed YAML content

Drawing Utilities

Use render_scrollbar() for scrollbar rendering:

from cudag import render_scrollbar

scrollbar = render_scrollbar(
    content_height=1000,     # Total content height
    visible_height=400,      # Visible viewport
    scroll_offset=200,       # Current scroll position
    width=12,                # Scrollbar width
    min_thumb=30,            # Minimum thumb height
)
# Returns PIL Image of scrollbar

Coordinate System

All coordinates use RU (Resolution Units) normalized to [0, 1000]:

Conversion: normalized = (pixel / image_dimension) * 1000
Real pixel coords stored in metadata.real_coords

Tool Call Actions

left_click - Click at coordinate
scroll - Scroll at coordinate with pixels
type - Type text
key - Press key combination
wait - Wait for duration
terminate - End interaction

Example Projects

See test-claim-window/ for a complete example implementing:

Procedure grid with scrolling
Provider names and procedure codes
Multi-column data rendering
Scroll task generation

Configuration Reference

# config/dataset.yaml
name_prefix: "my-dataset"
seed: 1337

tasks:
  scroll-grid: 100
  click-button: 50

task_config:
  min_rows: 15
  max_rows: 28
  tolerance: 50

train_split: 0.8
system_prompt: "compact"
output_dir: "datasets/my-dataset"

Contributing

Fork the repository
Create a feature branch
Make your changes:
- Generalize hardcoded values rather than replacing them with your own
- Add tests for new functionality
- Ensure all quality checks pass
Submit a pull request

Code quality requirements:

Lexical complexity checks
Syntax linting
Code formatting
Copyright headers

AI-assisted code is welcome provided it includes tests and passes all checks.

License

This software is source-available for research and educational purposes only. Commercial use requires a separate license agreement with Tylt LLC (1% of annual gross revenue attributable to use of this software).

See LICENSE for full terms.

For commercial licensing inquiries: hello@claimhawk.app

🚀 We're Hiring

ClaimHawk builds computer-use agents that automate real work using vision-language models.

If you have a passion for machine learning (and some real background) and want to see the path to 100x developer — we have open intern positions.

No resumes. Just shoot an email with your qualifications and passions to:

📧 hello@claimhawk.app

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

Dec 29, 2025

0.3.12

Dec 21, 2025

0.3.11

Dec 18, 2025

0.3.10

Dec 16, 2025

0.3.9

Dec 16, 2025

0.3.8

Dec 16, 2025

0.3.2

Dec 16, 2025

0.3.1

Dec 15, 2025

0.2.1

Dec 5, 2025

0.2.0

Dec 5, 2025

0.1.7

Dec 5, 2025

0.1.1

Dec 2, 2025

0.1.0

Dec 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cudag-0.4.1.tar.gz (162.5 kB view details)

Uploaded Dec 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cudag-0.4.1-py3-none-any.whl (186.2 kB view details)

Uploaded Dec 29, 2025 Python 3

File details

Details for the file cudag-0.4.1.tar.gz.

File metadata

Download URL: cudag-0.4.1.tar.gz
Upload date: Dec 29, 2025
Size: 162.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cudag-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`2ca16f2fdc3b89f7f4b6f56519ed951008be3b5d667acb4f9baaf4d9bbe24f47`
MD5	`e1b07aa6b2de2055d271dd6007537d02`
BLAKE2b-256	`4eb419f733a4dd69249f13e11234ce0a50fee65a0409e81756bf4b3fd929411e`

See more details on using hashes here.

File details

Details for the file cudag-0.4.1-py3-none-any.whl.

File metadata

Download URL: cudag-0.4.1-py3-none-any.whl
Upload date: Dec 29, 2025
Size: 186.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cudag-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bef817d48e44b0a9c1ff6d45c606a46b5425747a645df18f611e67a4710a8707`
MD5	`4ab7bc2d279c9fafa2293608a87554c7`
BLAKE2b-256	`8b1299f2c15c645cdf6bc88977cdefefeea505c0f24b436d2e43e8745301a944`

See more details on using hashes here.

cudag 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CUDAG - Computer Use Deterministic Augmented Generator

Overview

Installation

Quality Checks

Development Workflow

Step 1: Generate New App

Step 2: Add Base Images

Step 3: Generate Data Models

Step 4: Define Screen Layout

Step 5: Build Screen Renderer

Step 6: Build Region Renderers

Step 7: Test and Align Data

Step 8: Create Tasks

Step 9: Create Dataset Generator

Step 10: Generate Production Dataset

Output Format

Utility Functions

Researcher Name

Font Loading

Random Data Generation

Text Utilities

Config Utilities

Drawing Utilities

Coordinate System

Tool Call Actions

Example Projects

Configuration Reference

Contributing

License

🚀 We're Hiring

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes