A standalone library for dynamic BoundaryML schema generation and LLM response parsing
Project description
Dynamic BAML ๐
Dynamic BAML is a Python library that enables you to extract structured data from text using Large Language Models (LLMs) with dynamically generated schemas. Built on top of BoundaryML, it provides a high-level Python interface for BAML (Boundary Augmented Markup Language) with automatic schema generation.
Define your desired output structure as a simple Python dictionary, and Dynamic BAML handles the rest!
โจ Features
- ๐ฏ Schema-First Approach: Define output structure with Python dictionaries
- ๐ Dynamic BAML Generation: Automatically converts schemas to BAML code
- ๐ Multi-Provider Support: Works with OpenAI, Anthropic, Ollama, and OpenRouter
- ๐ก๏ธ Type Safety: Ensures structured, validated outputs
- ๐ง Easy Integration: Simple API with comprehensive error handling
- ๐ Complex Types: Support for nested objects, enums, arrays, and optional fields
- โก Performance: Efficient temporary project management and cleanup
๐ Quick Start
Installation
pip install dynamic-baml
Basic Usage
from dynamic_baml import call_with_schema
# Define your desired output structure
schema = {
"name": "string",
"age": "int",
"email": "string",
"is_active": "bool"
}
# Extract structured data from text
text = "John Doe is 30 years old, email: john@example.com, currently active user"
result = call_with_schema(
prompt_text=f"Extract user information from: {text}",
schema_dict=schema,
options={"provider": "openai", "model": "gpt-4"}
)
print(result)
# Output: {"name": "John Doe", "age": 30, "email": "john@example.com", "is_active": True}
๐ Table of Contents
- Installation & Setup
- Core Concepts
- Schema Types
- Provider Configuration
- Logging
- Advanced Usage
- Error Handling
- Examples
- API Reference
๐ ๏ธ Installation & Setup
Requirements
- Python 3.8+
- BAML CLI from BoundaryML:
npm install -g @boundaryml/baml- This provides the core BAML compiler and runtime
- Learn more at docs.boundaryml.com
Provider Setup
OpenAI
export OPENAI_API_KEY="your-openai-api-key"
Anthropic
export ANTHROPIC_API_KEY="your-anthropic-api-key"
Ollama (Local)
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull gemma3:1b
OpenRouter
export OPENROUTER_API_KEY="your-openrouter-api-key"
๐ง Core Concepts
Schema Dictionary
Define your desired output structure using Python dictionaries:
schema = {
"field_name": "field_type",
"nested_object": {
"sub_field": "string"
},
"optional_field": {"type": "string", "optional": True}
}
BAML Generation
Dynamic BAML automatically converts your schema to BAML code:
# Your schema:
{"name": "string", "age": "int"}
# Generated BAML:
class UserInfo {
name string
age int
}
๐ Schema Types
Basic Types
schema = {
"text": "string", # Text data
"number": "int", # Integer
"price": "float", # Decimal number
"active": "bool" # True/False
}
Arrays
schema = {
"tags": ["string"], # Array of strings
"scores": ["int"], # Array of integers
"ratings": ["float"] # Array of floats
}
Enums
schema = {
"status": {
"type": "enum",
"values": ["draft", "published", "archived"]
},
"priority": {
"type": "enum",
"values": ["low", "medium", "high", "urgent"]
}
}
Nested Objects
schema = {
"user": {
"name": "string",
"email": "string",
"profile": {
"bio": "string",
"avatar_url": "string"
}
},
"metadata": {
"created_at": "string",
"updated_at": "string"
}
}
Optional Fields
schema = {
"name": "string", # Required
"email": {"type": "string", "optional": True}, # Optional
"phone": {"type": "string", "optional": True} # Optional
}
โ๏ธ Provider Configuration
OpenAI
options = {
"provider": "openai",
"model": "gpt-4",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}
Anthropic
options = {
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}
Ollama (Local)
options = {
"provider": "ollama",
"model": "gemma3:1b",
"base_url": "http://localhost:11434", # Optional
"temperature": 0.1,
"timeout": 120
}
OpenRouter
options = {
"provider": "openrouter",
"model": "google/gemini-2.0-flash-exp",
"temperature": 0.1,
"max_tokens": 2000,
"timeout": 60
}
๐ชต Logging
Dynamic BAML provides flexible logging options to control output verbosity and destination.
Quick Start
from dynamic_baml import call_with_schema
# Basic usage with logging to file
options = {
"provider": "openai",
"model": "gpt-4",
"log_level": "info", # Control verbosity
"log_file": "./baml.log" # Output to file
}
result = call_with_schema(prompt, schema, options)
Configuration Options
Log Levels
Control the verbosity of BAML logging output:
| Level | Description | Use Case |
|---|---|---|
"off" |
No logging output | Production where logs aren't needed |
"error" |
Only fatal errors | Production minimal logging |
"warn" |
Errors and warnings (default) | Standard production logging |
"info" |
Detailed execution info | Development and debugging |
"debug" |
Verbose details and requests | Deep debugging |
"trace" |
Everything (very verbose) | Troubleshooting |
Log File Output
Specify where logs should be written:
- Default (no
log_file): Logs go to terminal/stdout - File path: Logs written to specified file
- Directory creation: Parent directories created automatically
- Append mode: Multiple calls append to the same file
Usage Examples
1. Log Level Only (Terminal Output)
options = {
"provider": "openai",
"log_level": "info" # Logs to terminal with info level
}
2. Log File Only (Default Level)
options = {
"provider": "openai",
"log_file": "./logs/baml.log" # Uses default log level
}
3. Both Level and File
options = {
"provider": "openai",
"log_level": "debug",
"log_file": "/var/log/baml/debug.log"
}
4. Disable Logging Completely
options = {
"provider": "openai",
"log_level": "off" # No logging output at all
}
5. Nested Log Directories
options = {
"provider": "openai",
"log_level": "info",
"log_file": "./logs/2024/january/extraction.log" # Dirs created automatically
}
๐ Advanced Usage
Safe Calling (No Exceptions)
from dynamic_baml import call_with_schema_safe
result = call_with_schema_safe(
prompt_text="Extract data from this text...",
schema_dict=schema,
options=options
)
if result["success"]:
data = result["data"]
print(f"Extracted: {data}")
else:
print(f"Error: {result['error']}")
print(f"Error type: {result['error_type']}")
Custom Prompting
# Build effective prompts for better extraction
prompt = f"""
Please extract the following information from the text below:
REQUIRED FIELDS:
- name: Person's full name
- age: Person's age as a number
- email: Valid email address
TEXT TO ANALYZE:
{input_text}
Please be accurate and only extract information that is clearly stated.
"""
result = call_with_schema(prompt, schema, options)
Batch Processing
def process_documents(documents, schema, options):
results = []
for doc in documents:
try:
result = call_with_schema(
f"Extract information from: {doc['content']}",
schema,
options
)
results.append({"doc_id": doc["id"], "data": result})
except Exception as e:
results.append({"doc_id": doc["id"], "error": str(e)})
return results
๐จ Error Handling
Exception Types
from dynamic_baml.exceptions import (
DynamicBAMLError, # Base exception
SchemaGenerationError, # Schema conversion failed
BAMLCompilationError, # BAML code compilation failed
LLMProviderError, # LLM provider call failed
ResponseParsingError, # Response parsing failed
ConfigurationError, # Provider configuration invalid
TimeoutError # Request timeout
)
try:
result = call_with_schema(prompt, schema, options)
except SchemaGenerationError as e:
print(f"Schema error: {e.message}")
print(f"Invalid schema: {e.schema_dict}")
except LLMProviderError as e:
print(f"Provider error: {e.message}")
print(f"Provider: {e.provider}")
except ResponseParsingError as e:
print(f"Parsing error: {e.message}")
print(f"Raw response: {e.raw_response}")
Error Recovery
def robust_extraction(text, schema, providers):
"""Try multiple providers for reliable extraction."""
for provider_opts in providers:
try:
return call_with_schema(text, schema, provider_opts)
except LLMProviderError:
continue # Try next provider
except Exception as e:
print(f"Unexpected error with {provider_opts['provider']}: {e}")
raise Exception("All providers failed")
# Usage
providers = [
{"provider": "openai", "model": "gpt-4"},
{"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"},
{"provider": "ollama", "model": "gemma3:1b"}
]
result = robust_extraction(text, schema, providers)
๐ Examples
See the examples/ directory for comprehensive examples:
- Basic Usage
- Complex Schemas
- Multi-Provider Setup
- Error Handling
- Batch Processing
- Real-World Use Cases
๐ API Reference
Core Functions
call_with_schema(prompt_text, schema_dict, options=None) -> dict
Extract structured data using a schema.
Parameters:
prompt_text(str): Text prompt to send to the LLMschema_dict(dict): Schema definition dictionaryoptions(dict, optional): Provider configuration options
Returns:
dict: Extracted data matching the schema structure
Raises:
DynamicBAMLError: Base exception for all errorsSchemaGenerationError: Schema conversion failedLLMProviderError: Provider call failedResponseParsingError: Response parsing failed
call_with_schema_safe(prompt_text, schema_dict, options=None) -> dict
Safe version that returns structured results instead of raising exceptions.
Returns:
{
"success": bool,
"data": dict, # Present if success=True
"error": str, # Present if success=False
"error_type": str # Present if success=False
}
Schema Generator
DictToBAMLGenerator.generate_schema(schema_dict, schema_name) -> str
Generate BAML schema code from dictionary.
Parameters:
schema_dict(dict): Schema definitionschema_name(str): Name for the generated schema
Returns:
str: Valid BAML schema code
Provider Factory
LLMProviderFactory.create_provider(options) -> LLMProvider
Create provider instance based on options.
LLMProviderFactory.get_available_providers() -> List[str]
Get list of currently available providers.
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
๐ Acknowledgments
Dynamic BAML is built on top of BoundaryML and the powerful BAML language. We extend our gratitude to the BoundaryML team for creating the foundational technology that makes structured LLM outputs possible.
About BoundaryML:
- ๐ Website: boundaryml.com
- ๐ BAML Documentation: docs.boundaryml.com
- ๐ ๏ธ BAML CLI:
npm install -g @boundaryml/baml
Dynamic BAML provides a Python-friendly interface and automatic schema generation on top of the robust BAML foundation.
๐ License
This project is licensed under the MIT License - see LICENSE file for details.
๐ Support
- ๐ Documentation
- ๐ Issue Tracker
- ๐ฌ Discussions
๐ Why Dynamic BAML?
Traditional Approach
# Complex manual prompt engineering
prompt = """
Extract user data and format as JSON with these exact fields:
- name (string)
- age (integer)
- email (string)
- is_active (boolean)
Text: "John Doe is 30 years old..."
Please ensure the output is valid JSON with no extra text.
"""
response = llm.call(prompt)
data = json.loads(response) # Hope it's valid JSON!
Dynamic BAML Approach
# Clean, type-safe schema definition
schema = {
"name": "string",
"age": "int",
"email": "string",
"is_active": "bool"
}
data = call_with_schema(
"Extract user info from: John Doe is 30 years old...",
schema
) # Guaranteed structured output!
Benefits:
- โ Type Safety: Guaranteed schema compliance
- โ No JSON Parsing: Direct structured output
- โ Better Prompts: Optimized prompt engineering
- โ Error Handling: Comprehensive error management
- โ Multi-Provider: Easy provider switching
- โ Complex Types: Enums, nested objects, arrays
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dynamic_baml-0.1.2.tar.gz.
File metadata
- Download URL: dynamic_baml-0.1.2.tar.gz
- Upload date:
- Size: 85.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d32029b2c2f7c34208d60e97018a5de07b7aa207d68a95d631cf1d3c618a5cc
|
|
| MD5 |
c595b4310beac044d6b16ea021af44b2
|
|
| BLAKE2b-256 |
f922760844f5428893290ca562adde5ba7bcc7e7fc13d84c4cd26b134c15214a
|
File details
Details for the file dynamic_baml-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dynamic_baml-0.1.2-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1e41261b6b12d61234222e2f7c5a045a54db7feefb22096772de07109261769
|
|
| MD5 |
ea9bc87bd31be7c01fa67cb58a791eb8
|
|
| BLAKE2b-256 |
f3cd419f04995b4909b2878c97a8abf4649ce2b85048c2639d6f03a7b24c383f
|