CUDAG - Computer Use Deterministic Augmented Generator framework for building VLM training data generators
Project description
CUDAG - Computer Use Deterministic Augmented Generator
A Rails-like framework for building VLM (Vision-Language Model) training data generators.
Overview
CUDAG provides a convention-over-configuration approach to generating training data for computer use models. It uses a domain-specific MVC-like pattern:
- Screen - Declarative UI definition (like Model in Rails)
- State - Dynamic data for rendering
- Renderer - Image generation (like View in Rails)
- Task - Interaction logic (like Controller in Rails)
- Model - Domain data types with generators (Patient, Provider, etc.)
Installation
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install CUDAG and dev dependencies
make install
make dev
Quality Checks
Always run quality checks during development:
make check # Run all checks (lint, typecheck, complexity)
make lint # Ruff linting and format checking
make typecheck # Mypy strict type checking
make complexity # Radon cyclomatic complexity analysis
make format # Auto-format code
Development Workflow
Building a CUDAG generator follows this process:
Step 1: Generate New App
# Install CUDAG globally
uvx pip install cudag
# Create a new generator project
cudag new claim-window-generator
# Navigate into the project
cd claim-window-generator
This creates:
claim-window-generator/
├── assets/ # Base images, fonts
├── config/
│ └── dataset.yaml
├── models/ # Domain model definitions
├── tasks/ # Task implementations
├── screen.py # Screen definition
├── state.py # State dataclass
├── renderer.py # Image renderer
└── datasets/ # Output (gitignored)
Step 2: Add Base Images
Copy your blank screen images and fonts:
- Full screen blank:
assets/base.png- The base UI template - Region blanks:
assets/grid_blank.png- Headers, overlays, etc. - Fonts:
assets/fonts/font.ttf- Font for rendering text
Step 3: Generate Data Models
Use Claude to generate domain models for your data:
from cudag import Model, FirstName, LastName, DOB, NPI, Phone, Email
from cudag import string, date_field, money, choice, computed
class Patient(Model):
first_name = FirstName()
last_name = LastName()
dob = DOB()
member_id = string(pattern=r"[A-Z]{3}[0-9]{6}")
phone = Phone()
email = Email()
# Computed fields
full_name = computed("first_name", "last_name")
age = years_since("dob")
class Procedure(Model):
code = string(pattern=r"D[0-9]{4}")
description = choice("Exam", "Cleaning", "X-Ray", "Crown")
fee = money(min_value=50.0, max_value=2500.0)
class Provider(Model):
first_name = string(faker="first_name")
last_name = string(faker="last_name")
npi = string(faker="npi")
specialty = choice("General", "Orthodontics", "Oral Surgery")
Field Types:
string(faker=..., pattern=..., choices=...)- Textinteger(min_value, max_value)- Numbersdecimal(min_value, max_value, precision)- Floatsmoney(min_value, max_value)- Currency ($X.XX)date_field(min_year, max_year, format)- Datestime_field(min_hour, max_hour, format)- Timesboolean(probability)- True/Falsechoice(*options, weights)- Pick from listcomputed(*sources)- Derived from other fieldsyears_since(field)- Age calculation
Step 4: Define Screen Layout
Declare your screen structure with regions:
from cudag import Screen, grid, button, scrollable, dropdown
class ClaimWindowScreen(Screen):
name = "claim-window"
base_image = "images/screen_blank.png"
size = (1155, 853)
# Grid region - bounds are (x, y, width, height)
procedure_grid = grid(
(0, 217, 1155, 167),
rows=8,
cols=17,
)
# Scrollable area
scroll_area = scrollable(
(0, 217, 1155, 167),
step=300,
direction="vertical",
)
# Buttons
billing_provider = button((85, 95, 200, 20), label="Billing Provider")
save_button = button((100, 800, 80, 30), label="Save")
Region Types:
region(bounds)- Simple clickable areabutton(bounds, label, description)- Clickable buttongrid(bounds, rows, cols)- Grid of cellsscrollable(bounds, step, direction)- Scrollable areadropdown(bounds, items)- Dropdown menu
Step 5: Build Screen Renderer
Render your screen with PIL, drawing data onto the base image:
from PIL import Image, ImageDraw, ImageFont
from cudag import BaseRenderer
from .screens import ClaimWindowScreen
from .state import GridState
class ClaimWindowRenderer(BaseRenderer[GridState]):
screen_class = ClaimWindowScreen
def load_assets(self) -> None:
self.font = ImageFont.truetype(
str(self.asset_path("fonts", "font.ttf")), 9
)
def render(self, state: GridState) -> tuple[Image.Image, dict]:
image = self.load_base_image()
draw = ImageDraw.Draw(image)
# Render grid rows
self._render_grid(image, draw, state)
# Render scrollbar
self._render_scrollbar(image, state)
metadata = self.build_metadata(state)
return image, metadata
Step 6: Build Region Renderers
For complex regions (grids, tables), create dedicated rendering methods:
def _render_grid(self, image, draw, state):
for idx, row in enumerate(state.visible_rows):
y = GRID_Y_START + idx * ROW_HEIGHT
for col in COLUMNS:
value = getattr(row, col["id"], "")
x = col["x"]
draw.text((x, y), str(value), font=self.font, fill=(0, 0, 0))
def _render_scrollbar(self, image, state):
# Calculate thumb position based on scroll state
thumb_y = calculate_thumb_position(state)
draw.rectangle([track_x, thumb_y, track_x + width, thumb_y + height],
fill=(100, 100, 100))
Step 7: Test and Align Data
This is critical - manually verify that:
- Grid columns align with data
- Text fits within column widths
- Row wrapping works correctly
- Scroll positions show correct content
- All UI elements render properly
# Generate a small test batch
python -m my_generator.generator --config config/dataset.yaml
# View generated images
open datasets/my-dataset/images/
Step 8: Create Tasks
Define tasks that generate training samples:
from cudag import BaseTask, TaskSample, TaskContext, ToolCall
class ScrollGridTask(BaseTask):
task_type = "scroll-grid"
def generate_sample(self, ctx: TaskContext) -> TaskSample:
# Generate state
state = GridState.generate(ctx.rng, min_rows=15, max_rows=28)
# Render image
image, metadata = self.renderer.render(state)
image_path = self.save_image(image, ctx)
# Get scroll coordinates
grid_center = self.renderer.get_grid_center()
return TaskSample(
id=self.build_id(ctx),
image_path=image_path,
human_prompt="Scroll down in the grid.",
tool_call=ToolCall.scroll(grid_center, pixels=300),
pixel_coords=grid_center,
image_size=self.renderer.screen_class.meta().size,
metadata={"task_type": self.task_type, **metadata},
)
Step 9: Create Dataset Generator
Use run_generator() to handle boilerplate (argument parsing, config loading, dataset naming):
from pathlib import Path
from cudag import run_generator
from .renderer import ClaimWindowRenderer
from .tasks import ScrollGridTask
def main():
renderer = ClaimWindowRenderer(assets_dir=Path("assets"))
tasks = [ScrollGridTask(config={}, renderer=renderer)]
run_generator(renderer, tasks)
if __name__ == "__main__":
main()
The run_generator() helper handles:
- Script invocation check
- Argument parsing (
--config,--seed) - Config loading from YAML
- Dataset naming (
{prefix}-{researcher}-{timestamp}) - Building dataset and tests
For custom behavior, use optional parameters:
run_generator(
renderer,
tasks,
extra_args=[("--debug", {"action": "store_true"})],
config_modifier=lambda config, args: setattr(config, 'seed', 999) if args.debug else None,
post_build=lambda output_dir, renderer: generate_debug_images(output_dir),
)
Step 10: Generate Production Dataset
# Generate full dataset
PYTHONPATH=src python -m my_generator.generator
# Verify output
ls datasets/my-dataset/
# images/ data.jsonl train.jsonl test.jsonl config.json
# Check JSONL format
head -1 datasets/my-dataset/data.jsonl | python -m json.tool
Output Format
Generated JSONL structure:
{
"id": "my-dataset_00000",
"image": "images/my-dataset_00000.jpg",
"conversations": [
{"from": "system", "value": "...tool definitions..."},
{"from": "human", "value": "<image>\nScroll down in the grid."},
{"from": "gpt", "value": "<tool_call>{\"name\": \"computer_use\", \"arguments\": {\"action\": \"scroll\", \"coordinate\": [500, 352], \"pixels\": 300}}</tool_call>"}
],
"metadata": {
"task_type": "scroll-grid",
"real_coords": [577, 300]
}
}
Utility Functions
Researcher Name
Use get_researcher_name() to automatically include researcher identity in dataset names:
from cudag import get_researcher_name
# Reads from .researcher file (supports "Name: mike" or plain "mike")
# Falls back to USER environment variable
researcher = get_researcher_name() # Returns "mike" or None
# Disable environment fallback
researcher = get_researcher_name(fallback_to_env=False)
Font Loading
Use load_font() for platform-aware font loading with automatic fallbacks:
from cudag import load_font, load_font_family
# Load with automatic system font fallback
font = load_font("assets/fonts/Inter.ttf", size=14)
# Load with explicit fallbacks
font = load_font(
"assets/fonts/Inter.ttf",
size=14,
fallbacks=["/System/Library/Fonts/Helvetica.ttc"]
)
# Load font family with variants
fonts = load_font_family(
"fonts/Inter-Regular.ttf",
size=14,
bold="fonts/Inter-Bold.ttf",
)
# fonts["regular"], fonts["bold"], fonts["italic"], fonts["bold_italic"]
Random Data Generation
Use choose(), date_in_range(), and amount() for consistent random data:
from random import Random
from cudag import choose, date_in_range, amount, weighted_choice
rng = Random(42)
# Choose random item from sequence
provider = choose(rng, ["Dr. Smith", "Dr. Jones", "Dr. Brown"])
# Generate random date in range
visit_date = date_in_range(rng, "2024-01-01", "2024-12-31", fmt="%m/%d/%Y")
# Generate random monetary amount
fee = amount(rng, 50.0, 500.0)
# With optional zero values (20% chance)
payment = amount(rng, 0.0, 100.0, allow_zero=True)
# Weighted random choice
status = weighted_choice(rng, {"pending": 0.7, "approved": 0.2, "denied": 0.1})
Text Utilities
Use text utilities for measurement and rendering:
from cudag import measure_text, center_text_position, draw_centered_text, wrap_text
from PIL import Image, ImageDraw, ImageFont
font = ImageFont.load_default()
# Measure text dimensions
width, height = measure_text("Hello World", font)
# Calculate centered position
tx, ty = center_text_position("Label", font, x=0, y=0, width=200, height=50)
# Draw centered text directly
img = Image.new("RGB", (200, 100), "white")
draw = ImageDraw.Draw(img)
draw_centered_text(draw, "Centered", font, x=0, y=0, width=200, height=100)
# Wrap text to fit width
lines = wrap_text("This is a long sentence that needs wrapping", max_width=100, font=font)
# Truncate text with ellipsis
short = truncate_text("This is a very long label", max_width=80, font=font)
# Returns "This is..." or similar
Config Utilities
Load YAML configuration files:
from cudag import load_yaml_config, get_config_path
# Get config path relative to your module
config_path = get_config_path(__file__, "canvas.yaml")
# Load YAML config
config = load_yaml_config(config_path)
# Returns dict with parsed YAML content
Drawing Utilities
Use render_scrollbar() for scrollbar rendering:
from cudag import render_scrollbar
scrollbar = render_scrollbar(
content_height=1000, # Total content height
visible_height=400, # Visible viewport
scroll_offset=200, # Current scroll position
width=12, # Scrollbar width
min_thumb=30, # Minimum thumb height
)
# Returns PIL Image of scrollbar
Coordinate System
All coordinates use RU (Resolution Units) normalized to [0, 1000]:
- Conversion:
normalized = (pixel / image_dimension) * 1000 - Real pixel coords stored in
metadata.real_coords
Tool Call Actions
left_click- Click at coordinatescroll- Scroll at coordinate with pixelstype- Type textkey- Press key combinationwait- Wait for durationterminate- End interaction
Example Projects
See test-claim-window/ for a complete example implementing:
- Procedure grid with scrolling
- Provider names and procedure codes
- Multi-column data rendering
- Scroll task generation
Configuration Reference
# config/dataset.yaml
name_prefix: "my-dataset"
seed: 1337
tasks:
scroll-grid: 100
click-button: 50
task_config:
min_rows: 15
max_rows: 28
tolerance: 50
train_split: 0.8
system_prompt: "compact"
output_dir: "datasets/my-dataset"
Contributing
- Fork the repository
- Create a feature branch
- Make your changes:
- Generalize hardcoded values rather than replacing them with your own
- Add tests for new functionality
- Ensure all quality checks pass
- Submit a pull request
Code quality requirements:
- Lexical complexity checks
- Syntax linting
- Code formatting
- Copyright headers
AI-assisted code is welcome provided it includes tests and passes all checks.
License
Copyright (c) 2025 Tylt LLC. All rights reserved.
This software is source-available for research and educational purposes only. Commercial use requires a separate license agreement with Tylt LLC (1% of annual gross revenue attributable to use of this software).
See LICENSE for full terms.
For commercial licensing inquiries: hello@claimhawk.app
🚀 We're Hiring
ClaimHawk builds computer-use agents that automate real work using vision-language models.
If you have a passion for machine learning (and some real background) and want to see the path to 100x developer — we have open intern positions.
No resumes. Just shoot an email with your qualifications and passions to:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cudag-0.4.1.tar.gz.
File metadata
- Download URL: cudag-0.4.1.tar.gz
- Upload date:
- Size: 162.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ca16f2fdc3b89f7f4b6f56519ed951008be3b5d667acb4f9baaf4d9bbe24f47
|
|
| MD5 |
e1b07aa6b2de2055d271dd6007537d02
|
|
| BLAKE2b-256 |
4eb419f733a4dd69249f13e11234ce0a50fee65a0409e81756bf4b3fd929411e
|
File details
Details for the file cudag-0.4.1-py3-none-any.whl.
File metadata
- Download URL: cudag-0.4.1-py3-none-any.whl
- Upload date:
- Size: 186.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bef817d48e44b0a9c1ff6d45c606a46b5425747a645df18f611e67a4710a8707
|
|
| MD5 |
4ab7bc2d279c9fafa2293608a87554c7
|
|
| BLAKE2b-256 |
8b1299f2c15c645cdf6bc88977cdefefeea505c0f24b436d2e43e8745301a944
|