Skip to main content

A lightweight pandas accessor for batch OpenAI-compatible LLM extraction

Project description

SilkLoom Core

SilkLoom Core is a small pandas accessor for batch LLM extraction.

DataFrame rows -> Jinja prompt render -> OpenAI-compatible chat call -> repaired JSON -> result DataFrame

Install

pip install silkloom-core

Quick Start

Importing silkloom_core registers df.llm.

import pandas as pd
import silkloom_core

df = pd.DataFrame(
    {
        "title": ["A clear experiment", "A weak evaluation"],
        "abstract": ["Reliable and reproducible.", "Too small to conclude much."],
    }
)

results = df.llm.setup(
    api_key="...",
    base_url="https://api.openai.com/v1",
    cache_path=".llm_cache.db",
).extract(
    "Title: {{ title }}\nAbstract: {{ abstract }}\nReturn JSON with keys label and summary.",
    model="gpt-4o-mini",
    max_workers=8,
    json_mode=True,
)

results contains only the parsed model output columns and keeps the original index, so you can join it back when needed:

df = df.join(results)

Client Setup

You can let SilkLoom create an OpenAI client:

df.llm.setup(api_key="...", base_url="...")

Or pass any OpenAI-compatible client with client.chat.completions.create(...):

from openai import OpenAI

client = OpenAI(api_key="...", base_url="...")
df.llm.setup(client=client)

Extraction

Use Jinja placeholders that match DataFrame columns. Literal JSON braces can stay as normal braces.

out = df.llm.extract(
    'Classify {{ text }} and return JSON like {"label": "positive", "score": 0.9}',
    model="gpt-4o-mini",
    temperature=0.1,
    max_workers=4,
    max_retries=2,
    verbose=True,
)

Malformed JSON is parsed with json_repair. If the model returns a JSON object, its keys become columns. If it returns another JSON value, the value is placed in _llm_raw. Parse or request failures are returned in _llm_error.

Cache And Audit Records

SQLite stores successful responses for cache reuse and also keeps richer request records for inspection. The cache key includes the model, rendered messages, JSON mode, and request options.

df.llm.setup(cache_path="cache/llm.sqlite").extract(...)

The cache table includes:

  • cache_key
  • ok
  • model
  • messages_json
  • params_json
  • request_json
  • response
  • parsed_json
  • error
  • attempts
  • created_at
  • updated_at

Only rows with ok = 1 are reused as cache hits. Failed requests and parse errors are recorded for debugging but are retried on the next run. Use a new cache path or delete the SQLite file when you want a fresh run.

Images

Pass image_column for local image paths, HTTP(S) image URLs, or existing data:image/... URLs. Local files are encoded as base64 data URLs.

out = df.llm.extract(
    "Extract fields from this receipt and return JSON.",
    image_column="receipt_path",
    model="gpt-4o-mini",
)

Rows with missing image values fall back to text-only prompts.

Progress And Cancel

Use progress_callback for UI integration:

def progress(done, total):
    print(done, total)

out = df.llm.extract("Analyze {{ text }}", progress_callback=progress)

From another thread or UI event, call:

df.llm.cancel()

Queued work is cancelled where possible, and running rows stop before the next retry.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

silkloom_core-6.0.2.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

silkloom_core-6.0.2-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file silkloom_core-6.0.2.tar.gz.

File metadata

  • Download URL: silkloom_core-6.0.2.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for silkloom_core-6.0.2.tar.gz
Algorithm Hash digest
SHA256 f39f09be98eac7d61e1b2de7fe41900d9b1d3ccfee969c13a30cd0b760d58b69
MD5 f0268522ad7b4557861c41890c7fc300
BLAKE2b-256 75ebcbbfc0ca52cd0dfc1e78a07e7499b11396ea72910935186f85c729c579e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for silkloom_core-6.0.2.tar.gz:

Publisher: publish.yml on LeLiu-GeoAI/silkloom-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file silkloom_core-6.0.2-py3-none-any.whl.

File metadata

  • Download URL: silkloom_core-6.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for silkloom_core-6.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3062d9f57c3960f7090dc24f57ef137967b8bd0a71d1acb39c0a0a9162f56628
MD5 0a30dbe0dbad122b30bb98aaaa12cec0
BLAKE2b-256 0541a9ad1d56347b5418b93948f16774bfa8afec73288009a62dd2b09ff9364a

See more details on using hashes here.

Provenance

The following attestation bundles were made for silkloom_core-6.0.2-py3-none-any.whl:

Publisher: publish.yml on LeLiu-GeoAI/silkloom-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page