Skip to main content

A standalone library for dynamic BoundaryML schema generation and LLM response parsing

Project description

Dynamic BAML ๐Ÿš€

Dynamic BAML is a Python library that enables you to extract structured data from text using Large Language Models (LLMs) with dynamically generated schemas. Built on top of BoundaryML, it provides a high-level Python interface for BAML (Boundary Augmented Markup Language) with automatic schema generation.

Define your desired output structure as a simple Python dictionary, and Dynamic BAML handles the rest!

โœจ Features

  • ๐ŸŽฏ Schema-First Approach: Define output structure with Python dictionaries
  • ๐Ÿ”„ Dynamic BAML Generation: Automatically converts schemas to BAML code
  • ๐ŸŒ Multi-Provider Support: Works with OpenAI, Anthropic, Ollama, and OpenRouter
  • ๐Ÿ›ก๏ธ Type Safety: Ensures structured, validated outputs
  • ๐Ÿ”ง Easy Integration: Simple API with comprehensive error handling
  • ๐Ÿ“Š Complex Types: Support for nested objects, enums, arrays, and optional fields
  • โšก Performance: Efficient temporary project management and cleanup

๐Ÿš€ Quick Start

Installation

pip install dynamic-baml

Basic Usage

from dynamic_baml import call_with_schema

# Define your desired output structure
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

# Extract structured data from text
text = "John Doe is 30 years old, email: john@example.com, currently active user"

result = call_with_schema(
    prompt_text=f"Extract user information from: {text}",
    schema_dict=schema,
    options={"provider": "openai", "model": "gpt-4"}
)

print(result)
# Output: {"name": "John Doe", "age": 30, "email": "john@example.com", "is_active": True}

๐Ÿ“‹ Table of Contents

๐Ÿ› ๏ธ Installation & Setup

Requirements

  • Python 3.8+
  • BAML CLI from BoundaryML: npm install -g @boundaryml/baml

Provider Setup

OpenAI

export OPENAI_API_KEY="your-openai-api-key"

Anthropic

export ANTHROPIC_API_KEY="your-anthropic-api-key"

Ollama (Local)

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull gemma3:1b

OpenRouter

export OPENROUTER_API_KEY="your-openrouter-api-key"

๐Ÿง  Core Concepts

Schema Dictionary

Define your desired output structure using Python dictionaries:

schema = {
    "field_name": "field_type",
    "nested_object": {
        "sub_field": "string"
    },
    "optional_field": {"type": "string", "optional": True}
}

BAML Generation

Dynamic BAML automatically converts your schema to BAML code:

# Your schema:
{"name": "string", "age": "int"}

# Generated BAML:
class UserInfo {
  name string
  age int
}

๐Ÿ“Š Schema Types

Basic Types

schema = {
    "text": "string",        # Text data
    "number": "int",        # Integer
    "price": "float",       # Decimal number
    "active": "bool"        # True/False
}

Arrays

schema = {
    "tags": ["string"],     # Array of strings
    "scores": ["int"],      # Array of integers
    "ratings": ["float"]    # Array of floats
}

Enums

schema = {
    "status": {
        "type": "enum",
        "values": ["draft", "published", "archived"]
    },
    "priority": {
        "type": "enum", 
        "values": ["low", "medium", "high", "urgent"]
    }
}

Nested Objects

schema = {
    "user": {
        "name": "string",
        "email": "string",
        "profile": {
            "bio": "string",
            "avatar_url": "string"
        }
    },
    "metadata": {
        "created_at": "string",
        "updated_at": "string"
    }
}

Optional Fields

schema = {
    "name": "string",                              # Required
    "email": {"type": "string", "optional": True}, # Optional
    "phone": {"type": "string", "optional": True}  # Optional
}

โš™๏ธ Provider Configuration

OpenAI

options = {
    "provider": "openai",
    "model": "gpt-4",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Anthropic

options = {
    "provider": "anthropic", 
    "model": "claude-3-5-sonnet-20241022",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

Ollama (Local)

options = {
    "provider": "ollama",
    "model": "gemma3:1b",
    "base_url": "http://localhost:11434",  # Optional
    "temperature": 0.1,
    "timeout": 120
}

OpenRouter

options = {
    "provider": "openrouter",
    "model": "google/gemini-2.0-flash-exp",
    "temperature": 0.1,
    "max_tokens": 2000,
    "timeout": 60
}

๐Ÿ”„ Advanced Usage

Safe Calling (No Exceptions)

from dynamic_baml import call_with_schema_safe

result = call_with_schema_safe(
    prompt_text="Extract data from this text...",
    schema_dict=schema,
    options=options
)

if result["success"]:
    data = result["data"]
    print(f"Extracted: {data}")
else:
    print(f"Error: {result['error']}")
    print(f"Error type: {result['error_type']}")

Custom Prompting

# Build effective prompts for better extraction
prompt = f"""
Please extract the following information from the text below:

REQUIRED FIELDS:
- name: Person's full name
- age: Person's age as a number
- email: Valid email address

TEXT TO ANALYZE:
{input_text}

Please be accurate and only extract information that is clearly stated.
"""

result = call_with_schema(prompt, schema, options)

Batch Processing

def process_documents(documents, schema, options):
    results = []
    for doc in documents:
        try:
            result = call_with_schema(
                f"Extract information from: {doc['content']}", 
                schema, 
                options
            )
            results.append({"doc_id": doc["id"], "data": result})
        except Exception as e:
            results.append({"doc_id": doc["id"], "error": str(e)})
    return results

๐Ÿšจ Error Handling

Exception Types

from dynamic_baml.exceptions import (
    DynamicBAMLError,           # Base exception
    SchemaGenerationError,      # Schema conversion failed
    BAMLCompilationError,       # BAML code compilation failed
    LLMProviderError,          # LLM provider call failed
    ResponseParsingError,       # Response parsing failed
    ConfigurationError,         # Provider configuration invalid
    TimeoutError               # Request timeout
)

try:
    result = call_with_schema(prompt, schema, options)
except SchemaGenerationError as e:
    print(f"Schema error: {e.message}")
    print(f"Invalid schema: {e.schema_dict}")
except LLMProviderError as e:
    print(f"Provider error: {e.message}")
    print(f"Provider: {e.provider}")
except ResponseParsingError as e:
    print(f"Parsing error: {e.message}")
    print(f"Raw response: {e.raw_response}")

Error Recovery

def robust_extraction(text, schema, providers):
    """Try multiple providers for reliable extraction."""
    for provider_opts in providers:
        try:
            return call_with_schema(text, schema, provider_opts)
        except LLMProviderError:
            continue  # Try next provider
        except Exception as e:
            print(f"Unexpected error with {provider_opts['provider']}: {e}")
    
    raise Exception("All providers failed")

# Usage
providers = [
    {"provider": "openai", "model": "gpt-4"},
    {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
    {"provider": "ollama", "model": "gemma3:1b"}
]

result = robust_extraction(text, schema, providers)

๐Ÿ“š Examples

See the examples/ directory for comprehensive examples:

๐Ÿ“– API Reference

Core Functions

call_with_schema(prompt_text, schema_dict, options=None) -> dict

Extract structured data using a schema.

Parameters:

  • prompt_text (str): Text prompt to send to the LLM
  • schema_dict (dict): Schema definition dictionary
  • options (dict, optional): Provider configuration options

Returns:

  • dict: Extracted data matching the schema structure

Raises:

  • DynamicBAMLError: Base exception for all errors
  • SchemaGenerationError: Schema conversion failed
  • LLMProviderError: Provider call failed
  • ResponseParsingError: Response parsing failed

call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict

Safe version that returns structured results instead of raising exceptions.

Returns:

{
    "success": bool,
    "data": dict,      # Present if success=True
    "error": str,      # Present if success=False
    "error_type": str  # Present if success=False
}

Schema Generator

DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str

Generate BAML schema code from dictionary.

Parameters:

  • schema_dict (dict): Schema definition
  • schema_name (str): Name for the generated schema

Returns:

  • str: Valid BAML schema code

Provider Factory

LLMProviderFactory.create_provider(options) -> LLMProvider

Create provider instance based on options.

LLMProviderFactory.get_available_providers() -> List[str]

Get list of currently available providers.

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

๐Ÿ™ Acknowledgments

Dynamic BAML is built on top of BoundaryML and the powerful BAML language. We extend our gratitude to the BoundaryML team for creating the foundational technology that makes structured LLM outputs possible.

About BoundaryML:

Dynamic BAML provides a Python-friendly interface and automatic schema generation on top of the robust BAML foundation.

๐Ÿ“„ License

This project is licensed under the MIT License - see LICENSE file for details.

๐Ÿ†˜ Support

๐Ÿ† Why Dynamic BAML?

Traditional Approach

# Complex manual prompt engineering
prompt = """
Extract user data and format as JSON with these exact fields:
- name (string)
- age (integer) 
- email (string)
- is_active (boolean)

Text: "John Doe is 30 years old..."

Please ensure the output is valid JSON with no extra text.
"""

response = llm.call(prompt)
data = json.loads(response)  # Hope it's valid JSON!

Dynamic BAML Approach

# Clean, type-safe schema definition
schema = {
    "name": "string",
    "age": "int", 
    "email": "string",
    "is_active": "bool"
}

data = call_with_schema(
    "Extract user info from: John Doe is 30 years old...",
    schema
)  # Guaranteed structured output!

Benefits:

  • โœ… Type Safety: Guaranteed schema compliance
  • โœ… No JSON Parsing: Direct structured output
  • โœ… Better Prompts: Optimized prompt engineering
  • โœ… Error Handling: Comprehensive error management
  • โœ… Multi-Provider: Easy provider switching
  • โœ… Complex Types: Enums, nested objects, arrays

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_baml-0.1.1.tar.gz (65.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynamic_baml-0.1.1-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file dynamic_baml-0.1.1.tar.gz.

File metadata

  • Download URL: dynamic_baml-0.1.1.tar.gz
  • Upload date:
  • Size: 65.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for dynamic_baml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 714535533fb200514885979a4cc19fae338fbb29ff9780995255d5fb9f4fdc35
MD5 d300b46edf5be863801baffd24bed9c3
BLAKE2b-256 d1f909f299b95052e0974bec7b172c1043045f89042be0e443c4f2d4036136b9

See more details on using hashes here.

File details

Details for the file dynamic_baml-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dynamic_baml-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for dynamic_baml-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a25b9a071a732348318d4d3f95d10ed7fcd38804b3f92672049b9468dc0f2bf
MD5 329e6c836ec83176c6bf62ed3d03208e
BLAKE2b-256 37673efbfd21987461715d9f3cd3d111fc4b8eeaf2f43e3a3b3868693586542b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page