Skip to main content

LLM-powered structured data transformation

Project description

Smelt AI

PyPI Docs License: MIT Python 3.10+

LLM-powered structured data transformation. Feed in rows of data, get back strictly typed Pydantic models — batched, concurrent, and validated.

from smelt import Model, Job
from pydantic import BaseModel

class Classification(BaseModel):
    sector: str
    sub_sector: str
    is_public: bool

model = Model(provider="openai", name="gpt-4.1-mini")
job = Job(
    prompt="Classify each company by industry sector and whether it's publicly traded.",
    output_model=Classification,
)

result = job.run(model, data=[
    {"name": "Apple", "desc": "Consumer electronics and software"},
    {"name": "Stripe", "desc": "Payment processing platform"},
    {"name": "Mayo Clinic", "desc": "Nonprofit medical center"},
])

for row in result.data:
    print(row)  # Classification(sector='Technology', sub_sector='Consumer Electronics', is_public=True)

Install

pip install smelt-ai[openai]      # OpenAI models
pip install smelt-ai[anthropic]   # Anthropic models
pip install smelt-ai[google]      # Google Gemini models

Requires Python 3.10+.


How It Works

list[dict] → Tag with row_id → Split into batches → Concurrent LLM calls → Validate → Reorder → SmeltResult[T]
  1. Each input row gets a row_id for tracking
  2. Rows are split into batches of configurable size
  3. Batches run concurrently through the LLM with structured output
  4. Each response is validated (schema, row IDs, count)
  5. Results are reordered to match original input order
  6. Everything is returned as a typed SmeltResult with metrics

API

Model

Wraps a LangChain chat model provider. Any LangChain-supported provider works.

model = Model(
    provider="openai",          # LangChain provider name
    name="gpt-4.1-mini",       # Model identifier
    api_key="sk-...",           # Optional — falls back to env var (e.g. OPENAI_API_KEY)
    params={"temperature": 0},  # Forwarded to the chat model constructor
)

Job

Defines what transformation to run and how to batch it.

job = Job(
    prompt="Your transformation instructions here",
    output_model=MyPydanticModel,  # Schema for each output row
    batch_size=10,                 # Rows per LLM request (default: 10)
    concurrency=3,                 # Max concurrent requests (default: 3)
    max_retries=3,                 # Retries per failed batch (default: 3)
    shuffle=False,                 # Shuffle rows before batching (default: False)
    stop_on_exhaustion=True,       # Raise on failure vs collect errors (default: True)
)

Run:

result = job.run(model, data=rows)          # Sync
result = await job.arun(model, data=rows)   # Async

Test with a single row first:

result = job.test(model, data=rows)         # Sync — runs only the first row
result = await job.atest(model, data=rows)  # Async

SmeltResult[T]

result.data       # list[T] — transformed rows in original order
result.errors     # list[BatchError] — failed batches
result.metrics    # SmeltMetrics — tokens, timing, retries
result.success    # bool — True if no errors

Error Handling

All exceptions inherit from SmeltError.

Exception When
SmeltConfigError Invalid config (bad provider, empty prompt, etc.)
SmeltValidationError LLM output fails schema validation
SmeltAPIError Non-retriable API error (401, 403)
SmeltExhaustionError Batch exhausted all retries (stop_on_exhaustion=True)
from smelt.errors import SmeltExhaustionError

try:
    result = job.run(model, data=rows)
except SmeltExhaustionError as e:
    print(f"Partial: {len(e.partial_result.data)} rows succeeded")

Or collect errors without raising:

job = Job(prompt="...", output_model=MyModel, stop_on_exhaustion=False)
result = job.run(model, data=rows)

if not result.success:
    for err in result.errors:
        print(f"Batch {err.batch_index} failed: {err.message}")

Supported Providers

Provider provider value Example models
OpenAI "openai" gpt-5.2, gpt-4.1-mini, gpt-4.1, gpt-4o, o4-mini
Anthropic "anthropic" claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001
Google Gemini "google_genai" gemini-3-flash-preview, gemini-3-pro-preview, gemini-2.5-flash

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smelt_ai-0.1.4.tar.gz (226.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smelt_ai-0.1.4-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file smelt_ai-0.1.4.tar.gz.

File metadata

  • Download URL: smelt_ai-0.1.4.tar.gz
  • Upload date:
  • Size: 226.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for smelt_ai-0.1.4.tar.gz
Algorithm Hash digest
SHA256 bed1bed81d27988f0e8186fcf68efdf5b4f458211fe9c7f1ceb4a6963786c472
MD5 68684171fe99db7827bbcc39fefe8ed1
BLAKE2b-256 235af882e9a0bf3ac97b270e84605cd4858f272042404131f8f338ebaf1b9817

See more details on using hashes here.

File details

Details for the file smelt_ai-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: smelt_ai-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for smelt_ai-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7e5c9e3764ffb9143bd403dfebe3a88506d1236012d1a39fd350098157cf7d12
MD5 df6f064964354d69ab8f16e5893ac077
BLAKE2b-256 f1761aa3a308fcfeec6e19279122c1cddc2536957dd031967fbaa503a4745f23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page