Data Designer utility for loading column configurations from YAML files
Project description
Data Designer Declarative Columns Plugin
A Data Designer utility that allows loading multiple column configurations from YAML.
Features
- Multi-column YAML: Load entire column configurations from a YAML file or inline string
- All column types supported: sampler, llm-text, llm-code, llm-structured, llm-judge, expression, validation
- MCP Tool Configs: Define
tool_configsin YAML for MCP tool use workflows - Reusable configurations: Share column definitions across projects via YAML files
- Full parity with Python API: YAML configs are equivalent to programmatic
add_column()calls
Installation
Install from PyPI:
pip install data-designer-declarative-columns
Or with uv:
uv add data-designer-declarative-columns
For development (editable install):
git clone https://github.com/webmaxru/data-designer-declarative-columns.git
cd data-designer-declarative-columns
pip install -e .
Usage
From YAML File
from data_designer_declarative_columns import DeclarativeColumnsConfig
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()
# Load columns from YAML file
config = DeclarativeColumnsConfig(file="examples/product_reviews.yaml")
config.add_columns_to_builder(config_builder)
# Run preview
preview_results = data_designer.preview(config_builder=config_builder, num_records=3)
preview_results.display_sample_record()
From Inline YAML
from data_designer_declarative_columns import DeclarativeColumnsConfig
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()
# Define columns inline (excerpt from product_reviews.yaml)
config = DeclarativeColumnsConfig(yaml_content="""
columns:
- name: product_category
column_type: sampler
sampler_type: category
params:
values: [Electronics, Clothing, Books, Home & Garden]
weights: [3, 2, 2, 1]
- name: price
column_type: sampler
sampler_type: uniform
params:
low: 9.99
high: 499.99
- name: customer
column_type: sampler
sampler_type: person_from_faker
params:
locale: en_US
age_range: [18, 65]
- name: review
column_type: llm-text
model_alias: nvidia-text
prompt: |
Write a realistic customer review for a product in the {{ product_category }} category.
The product costs ${{ price | round(2) }}.
- name: customer_name
column_type: expression
expr: "{{ customer.first_name }} {{ customer.last_name }}"
""")
config.add_columns_to_builder(config_builder)
preview_results = data_designer.preview(config_builder=config_builder, num_records=3)
preview_results.display_sample_record()
With MCP Tool Configs
For MCP tool use workflows, define tool_configs in the YAML and use get_tool_configs():
from data_designer_declarative_columns import DeclarativeColumnsConfig
import data_designer.config as dd
from data_designer.interface import DataDesigner
# Load configuration with tool_configs
config = DeclarativeColumnsConfig(file="examples/basic_mcp.yaml")
# Create builder WITH tool configs from YAML
config_builder = dd.DataDesignerConfigBuilder(tool_configs=config.get_tool_configs())
config.add_columns_to_builder(config_builder)
# Define MCP provider (server code must still be Python)
mcp_provider = dd.LocalStdioMCPProvider(
name="basic-tools",
command=sys.executable,
args=["your_mcp_server.py", "serve"],
)
# Create DataDesigner with MCP provider
data_designer = DataDesigner(mcp_providers=[mcp_provider])
preview_results = data_designer.preview(config_builder, num_records=2)
YAML with tool_configs:
tool_configs:
- tool_alias: basic-tools
providers: [basic-tools]
allow_tools: [get_fact, add_numbers]
max_tool_call_turns: 5
timeout_sec: 30.0
columns:
- name: topic
column_type: sampler
sampler_type: category
params:
values: [python, earth, water]
- name: fact_response
column_type: llm-text
model_alias: nvidia-text
prompt: "Use the get_fact tool to look up '{{ topic }}'"
tool_alias: basic-tools
with_trace: true
YAML Configuration Format
product_reviews.yaml (excerpt):
columns:
- name: product_category
column_type: sampler
sampler_type: category
params:
values: [Electronics, Clothing, Books, Home & Garden]
weights: [3, 2, 2, 1]
- name: customer
column_type: sampler
sampler_type: person_from_faker
params:
locale: en_US
age_range: [18, 65]
- name: review
column_type: llm-text
model_alias: nvidia-text
prompt: |
Write a realistic customer review for a {{ product_subcategory }}
in the {{ product_category }} category.
- name: review_analysis
column_type: llm-structured
model_alias: nvidia-text
prompt: |
Analyze this product review and extract structured information:
Review: "{{ review }}"
output_format:
type: object
properties:
sentiment:
type: string
enum: [positive, neutral, negative]
would_recommend:
type: boolean
required: [sentiment, would_recommend]
Inspecting Loaded Columns
config = DeclarativeColumnsConfig(file="examples/product_reviews.yaml")
# Get list of column names
print(config.get_column_names())
# ['request_id', 'product_category', 'product_subcategory', 'price', ...]
# Get number of columns
print(len(config))
# 14
# Access raw column definitions
for col in config.columns:
print(f"{col['name']}: {col['column_type']}")
Supported Column Types
| Column Type | Description | Example Fields |
|---|---|---|
sampler |
Built-in samplers (UUID, Category, Uniform, Person, etc.) | sampler_type, params |
llm-text |
LLM text generation with Jinja2 templating | model_alias, prompt, system_prompt |
llm-code |
Code generation with language specification | model_alias, code_lang, prompt |
llm-structured |
Structured JSON generation with schema | model_alias, prompt, output_schema |
llm-judge |
Quality assessment with scoring rubrics | model_alias, prompt, scores |
expression |
Expression-based derived columns | expr |
validation |
Validation results (Python, SQL, Code validators) | validator_type, target_columns, validator_params |
Examples
See the examples/ folder for complete YAML configurations:
-
product_reviews.yaml - Comprehensive example with all column types:
- Samplers: UUID, Category, Subcategory, Uniform, Gaussian, DateTime, Person (Faker)
- LLM Text: Product review generation
- LLM Code: SQL query generation
- LLM Structured: Review sentiment analysis
- LLM Judge: Review quality scoring
- Expressions: Derived values (word count, price tier, customer name)
-
text_to_python.yaml - Python code generation with validation
-
text_to_sql.yaml - SQL query generation
-
multi_turn_chat.yaml - Multi-turn conversations
-
product_info_qa.yaml - Product Q&A
-
basic_mcp.yaml - MCP tool use
-
pdf_qa.yaml - Document Q&A with MCP
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_designer_declarative_columns-1.0.1.tar.gz.
File metadata
- Download URL: data_designer_declarative_columns-1.0.1.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c9b22c4b0be89e4e82c4c54d7b5abc3c0731935f81d1aa5b8f154d12ad1cad9
|
|
| MD5 |
eccd5a7d826d24e85d004fda2d9c8659
|
|
| BLAKE2b-256 |
300f99e98008a18ae667a80867b786b02e1fa0745b37ac6884ead93d074560b6
|
File details
Details for the file data_designer_declarative_columns-1.0.1-py3-none-any.whl.
File metadata
- Download URL: data_designer_declarative_columns-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edc2c78cedc090a5495aa5a04fcc2d7b85faab25463cd1ae8d6aed71897e9962
|
|
| MD5 |
2c797ce1e52dc5d4ed8f316d7f0fc22b
|
|
| BLAKE2b-256 |
c75c378378753f13e2c9392131b918a542c112b2072a87faf06550dc0a5abe3f
|