Skip to main content

Turn any trained sklearn/XGBoost model into an LLM-callable tool with auto-generated schemas and typed I/O.

Project description

predikit

PyPI version

Turn any trained scikit-learn or XGBoost model into an LLM-callable tool — auto-generated JSON schemas, typed I/O, zero boilerplate.

tool = ModelTool(model=clf, name="classify_iris", ...)
tool.to_openai()              # OpenAI function schema, ready to pass to the API
tool.invoke({"sqft": 2200})   # → {"price_usd": 370730}

Install

pip install predikit

# With XGBoost support
pip install predikit[xgboost]

# With LangChain support
pip install predikit[langchain]

30-second example

from pydantic import BaseModel, Field
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from predikit import ModelTool

# Train
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(max_iter=200).fit(X, y)

# Define what the LLM will pass in
class IrisInput(BaseModel):
    sepal_length: float = Field(description="Sepal length in cm")
    sepal_width:  float = Field(description="Sepal width in cm")
    petal_length: float = Field(description="Petal length in cm")
    petal_width:  float = Field(description="Petal width in cm")

# Wrap the model
tool = ModelTool(
    model=clf,
    name="classify_iris",
    description="Classify an iris flower: 0=setosa, 1=versicolor, 2=virginica.",
    input_schema=IrisInput,
    output_name="species",
    output_description="Predicted species index",
)

# Get an OpenAI-ready schema
import json
print(json.dumps(tool.to_openai(), indent=2))

# Call it directly
tool.invoke({
    "sepal_length": 5.1, "sepal_width": 3.5,
    "petal_length": 1.4, "petal_width": 0.2,
})
# → {"species": 0}

Core API

ModelTool

ModelTool(
    model,               # fitted sklearn-compatible estimator
    name: str,           # tool name the LLM sees
    description: str,    # tool description the LLM sees
    input_schema,        # Pydantic BaseModel describing inputs
    output_name: str,    # key for the prediction in the returned dict
    output_description: str,
)
Method Returns What it does
.invoke(input_dict) dict Validates → predicts → returns {output_name: value}
.to_openai() dict OpenAI function-calling schema
.to_langchain() StructuredTool LangChain tool
.to_callable() Callable Plain Python function

ToolRegistry

Group multiple tools for bulk export:

registry = ToolRegistry([price_tool, risk_tool])
registry.to_openai()     # → list[dict], pass directly to OpenAI
registry.to_langchain()  # → list[StructuredTool]
registry.get("name")     # → ModelTool

Field naming rule

Your Pydantic schema field names must exactly match the column names the model was trained on.

predikit maps inputs to features by name, not position. If you trained on a DataFrame with columns ["sqft", "bedrooms"], your schema fields must be sqft and bedrooms — not sq_ft, not Sqft.

# ✓ Columns match: sqft, bedrooms, bathrooms
class GoodInput(BaseModel):
    sqft:      float
    bedrooms:  float
    bathrooms: float

# ✗ Name mismatch — raises ValueError at runtime
class BadInput(BaseModel):
    square_footage: float  # model expects "sqft"
    beds:           float  # model expects "bedrooms"
    baths:          float  # model expects "bathrooms"

When there's a mismatch, predikit tells you exactly which names are wrong:

ValueError: Input schema is missing model features: ['sqft', 'bedrooms'].
Schema has: ['square_footage', 'beds', 'bathrooms'], model expects: ['sqft', 'bedrooms', 'bathrooms']

Tip: If you trained with a numpy array (no DataFrame), predikit has no feature names to check — it uses your schema's field definition order instead.

Cookbook

XGBoost regression

from xgboost import XGBRegressor
from predikit import ModelTool

reg = XGBRegressor().fit(X_train, y_train)

class HouseInput(BaseModel):
    sqft:       float
    bedrooms:   float
    year_built: float

tool = ModelTool(
    model=reg,
    name="price_estimate",
    description="Predict home price in USD.",
    input_schema=HouseInput,
    output_name="price_usd",
    output_description="Predicted sale price in USD",
)

Multiple tools in one registry

registry = ToolRegistry([price_tool, risk_tool, demand_tool])

# OpenAI
response = client.chat.completions.create(
    model="gpt-4o",
    tools=registry.to_openai(),
    ...
)

# LangChain
agent = initialize_agent(tools=registry.to_langchain(), ...)

Bool inputs from an LLM

LLMs sometimes return "yes", "true", or "1" for boolean fields. predikit coerces these automatically before Pydantic validation:

class Input(BaseModel):
    has_pool: bool

tool.invoke({"has_pool": "yes"})   # → coerced to True
tool.invoke({"has_pool": "false"}) # → coerced to False
tool.invoke({"has_pool": "maybe"}) # → raises ValueError with clear message

Supported strings: true/false, yes/no, 1/0, on/off.

Confidence-aware routing

Route uncertain predictions to a fallback tool, or raise an error the agent can catch:

from predikit import ModelTool, LowConfidenceError

tool = ModelTool(
    model=clf,
    name="churn_risk",
    description="Predict member churn risk.",
    input_schema=MemberInput,
    output_name="churn_probability",
    output_description="Probability of churn (0–1)",
    confidence_threshold=0.80,       # classifiers with predict_proba only
    on_low_confidence="warn",        # "warn" | "raise" | "fallback"
    fallback_tool=rule_based_tool,   # used when mode="fallback"
)

result = tool.invoke(inputs)
if result.get("_low_confidence"):
    print(f"Uncertain ({result['_confidence']:.2f}) — consider routing to a human")
mode behaviour
"warn" returns prediction + _confidence + _low_confidence: True
"raise" raises LowConfidenceError
"fallback" invokes fallback_tool and returns its result

Only applies to classifiers that implement predict_proba. Regressors are unaffected.

Multi-model ensemble

Call multiple models and reconcile their outputs in one step:

from predikit import ModelEnsemble, ToolRegistry

ensemble = ModelEnsemble(
    tools=[price_tool_a, price_tool_b],
    name="averaged_price",
    description="Ensemble price: mean of two XGBoost models.",
    strategy="mean",              # "collect" | "mean" | "vote"
)

result  = ensemble.invoke(inputs)  # → {"price_usd": 370112}
schema  = ensemble.to_openai()     # works exactly like ModelTool
strategy behaviour
"collect" merges all outputs into one dict (tools can have different output_name)
"mean" averages numeric outputs (all tools must share output_name)
"vote" majority class vote (all tools must share output_name)

Register ensembles alongside individual tools:

registry = ToolRegistry(tools=[price_tool], ensembles=[ensemble])
registry.to_openai()  # includes both tools and ensembles

Orlando real estate demo

See examples/03_orlando_real_estate.py for a full end-to-end walkthrough: synthetic dataset → XGBoost training → ModelTool → registry → OpenAI schema → prediction.

Roadmap

Planned for later releases:

  • MLflow / Snowflake Model Registry integration
  • HuggingFace / PyTorch / TensorFlow support
  • Async invocation

License

MIT © Tejas Tumakuru Ashok

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predikit-0.2.0.tar.gz (31.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

predikit-0.2.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file predikit-0.2.0.tar.gz.

File metadata

  • Download URL: predikit-0.2.0.tar.gz
  • Upload date:
  • Size: 31.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for predikit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 abb3a0a1effabe7ee9de63fa857f34e75c578dc0fee17fdccd59ee22f7d2335c
MD5 923d7314bce63cda731b807f557a6021
BLAKE2b-256 12c07dee9acc61281a60395498fb5855709edfb6e8bfd5642381737c46596c2c

See more details on using hashes here.

File details

Details for the file predikit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: predikit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for predikit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 093b8e27f0d0ee0e899daacd5c8f7bd502572c66fad0ddf5e3a0a2beff8fa2e8
MD5 bf1657b8bae7adcb2e8add04eb2e5d26
BLAKE2b-256 020330544daefe23f7b76d784ae01d0a14b637bd9d0bcd4019846bbdc91cd2ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page