Data Designer plugin adding cache-enabled LLM text and structured column generators
Project description
data-designer-cache-llm-column
A plugin for data-designer that adds cache-enabled LLM column generators.
Why use it?
When building datasets with data-designer using Large Language Models (LLMs), generation can be slow and expensive. When iterating on your data pipelines or prompts, you often end up running the exact same model calls multiple times.
This plugin automatically caches your LLM responses locally so that subsequent runs with identical inputs (prompt, variables, model) are instantaneous. This significantly speeds up dataset generation during development and reduces your overall LLM API costs.
Features
- Drop-in replacements: Wraps the existing
data-designerLLM columns (Text,Structured,Code,Judge) with caching capabilities. - Cost and Time Savings: Avoids redundant LLM API calls by saving responses locally.
- Flexible Cache Control: Independently control whether to load from cache or save to cache for each column.
Installation
Install using pip:
pip install data-designer-cache-llm-column
Or using uv (recommended):
uv pip install data-designer-cache-llm-column
Available Column Types
This plugin exposes four new cache-enabled column configurations:
CacheLLMTextColumnConfig(Type:"cache-llm-text")CacheLLMStructuredColumnConfig(Type:"cache-llm-structured")CacheLLMCodeColumnConfig(Type:"cache-llm-code")CacheLLMJudgeColumnConfig(Type:"cache-llm-judge")
Configuration Options
All cached columns inherit from their respective base data-designer column configs, meaning you can still use parameters like name, model_alias, prompt, etc. In addition, you get these caching-specific options:
cache_folder(str): The folder path where cache files will be stored. Default is"cache_folder".save_cache(bool): Whether to save new LLM responses to the cache. Default isTrue.load_cache(bool): Whether to attempt loading from the cache before calling the LLM API. Default isTrue. By setting this toFalseandsave_cachetoTrue, you can force the model to re-generate and overwrite the cache.
Example Usage
Caching LLM Text Output
import data_designer.config as dd
import pandas as pd
from data_designer.config.seed_source_dataframe import DataFrameSeedSource
from data_designer.interface import DataDesigner
from data_designer_cache_llm_column.cache_llm_text_column.config import CacheLLMTextColumnConfig
# Create seed dataset
seed_df = pd.DataFrame({"language": ["English", "Spanish", "French"]})
config_builder = dd.DataDesignerConfigBuilder()
config_builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
# Add a cached LLM text column
config_builder.add_column(
CacheLLMTextColumnConfig(
name="greeting",
model_alias="nvidia-text", # Or whatever model alias you have configured
prompt="Write a casual {{ language }} greeting. One sentence only.",
cache_folder="./llm_cache_storage",
save_cache=True,
load_cache=True, # Hits the API on 1st run, loads from cache on subsequent runs
)
)
data_designer = DataDesigner()
# First run: Hits the Model API and saves to cache
print("Run 1: Generating...")
results1 = data_designer.preview(config_builder, num_records=3)
print(results1.dataset)
# Second run: Instantly loads from cache
print("Run 2: Loading from cache...")
results2 = data_designer.preview(config_builder, num_records=3)
print(results2.dataset)
Caching Structured Output
import data_designer.config as dd
import pandas as pd
from data_designer.config.seed_source_dataframe import DataFrameSeedSource
from data_designer.interface import DataDesigner
from pydantic import BaseModel
from data_designer_cache_llm_column.cache_llm_structured_column.config import CacheLLMStructuredColumnConfig
# Define your expected output structure
class GreetingInfo(BaseModel):
greeting: str
formality: str
seed_df = pd.DataFrame({"language": ["English", "Spanish", "French"]})
config_builder = dd.DataDesignerConfigBuilder()
config_builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
# Add a cached LLM structured column
config_builder.add_column(
CacheLLMStructuredColumnConfig(
name="greeting_info",
model_alias="nvidia-text",
prompt="Generate a greeting in {{ language }} and classify its formality level.",
output_format=GreetingInfo,
cache_folder="./llm_cache_storage_structured",
save_cache=True,
load_cache=True,
)
)
data_designer = DataDesigner()
results = data_designer.preview(config_builder, num_records=3)
print(results.dataset)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_designer_cache_llm_column-0.1.3.tar.gz.
File metadata
- Download URL: data_designer_cache_llm_column-0.1.3.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f12c8b80ba33e5cc585956bb2700efd8b546b4056c1ec36b8593b2d1a97781c
|
|
| MD5 |
057b43fdbb7034af327b596c726c2976
|
|
| BLAKE2b-256 |
be1462e3417157565af5fbbbe0b5110f3de6ce363195925faa9ecb262ae1fd60
|
File details
Details for the file data_designer_cache_llm_column-0.1.3-py3-none-any.whl.
File metadata
- Download URL: data_designer_cache_llm_column-0.1.3-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90be9998d9a8b711ad722db64913f1ba48e39f904e500d0c8e1aa6f824457f31
|
|
| MD5 |
7b6a9f87315f32fd7037aeef46a5e7f5
|
|
| BLAKE2b-256 |
5f486ec324c5a39150e68035ba44036f2ff43f8f897da6f5d57ba089d978b681
|