A framework for building ML models from natural language

These details have not been verified by PyPI

Project links

Project description

smolmodels ✨

Build specialized ML models using natural language.

smolmodels is a Python library that lets you create machine learning models by describing what you want them to do in plain English. Instead of wrestling with model architectures and hyperparameters, you simply describe your intent, define your inputs and outputs, and let smolmodels handle the rest.

import pandas as pd
import smolmodels as sm

# Define a house price predictor in terms of intent
model = sm.Model(
    intent="Predict house prices based on property features",
    # input_schema and output_schema are optional
    input_schema={
        "square_feet": float,
        "bedrooms": int,
        "location": str,
        "year_built": int
    },
    output_schema={
        "predicted_price": float
    }
)

# Build the model, using the backend of your choice; optionally generate synthetic training data
model.build(
   dataset=pd.read_csv("house-prices.csv"),
   generate_samples=1000,
   provider="openai:gpt-4o-mini"
)

# Make predictions
price = model.predict({
    "square_feet": 2500,
    "bedrooms": 4,
    "location": "San Francisco",
    "year_built": 1985
})

# Save the model for later use
sm.save_model(model, "house-price-predictor")

How Does It Work?

smolmodels combines graph search with LLMs to generate candidate models that meet the specified intent, and then selects the best model based on performance and constraints. The process consists of four main phases:

Intent Analysis: problem description is analyzed to understand the type of model needed and what metric to optimise for.
Data Generation: synthetic data can be generated to enable model build when there is no training data available, or when the existing data has insufficient coverage of the feature space.
Model Building:
1. Selects appropriate model architectures
2. Handles feature engineering
3. Manages training and validation
Validation & Refinement: the model is tested against constraints and refined using directives (like "optimize for speed" or "prioritize model types with better explainability").

Key Features

📝 Natural Language Intent

Models are defined using natural language descriptions and schema specifications, abstracting away machine learning specifics.

🎲 Data Generation

Built-in synthetic data generation for training and validation.

🎯 Directives for fine-grained Control (Not Yet Implemented - Coming Soon)

Guide the model building process with high-level directives:

from smolmodels import Directive

model.build(directives=[
    Directive("Optimize for inference speed"),
    Directive("Prioritize interpretability")
])

✅ Optional Constraints (Not Yet Implemented - Coming Soon)

Optional declarative constraints for model validation:

from smolmodels import Constraint

# Ensure predictions are always positive
positive_constraint = Constraint(
    lambda inputs, outputs: outputs["predicted_price"] > 0,
    description="Predictions must be positive"
)

model = Model(
    intent="Predict house prices...",
    constraints=[positive_constraint],
    ...
)

🌐 Multi-Provider Support

You can use multiple LLM providers as a backend for model generation. You can specify the provider and model in the format provider:[model] when calling build():

model.build(pd.read_csv("house-prices.csv"), provider="openai:gpt-4o-mini")

Currently supported providers are openai, anthropic, google and deepseek. You need to configure the appropriate API keys for each provider as environment variables (see installation instructions).

Installation & Setup

pip install smolmodels

API Keys

Set required API keys as environment variables. Which API keys are required depends on which provider you are using.

export OPENAI_API_KEY=<your-API-key>
export ANTHROPIC_API_KEY=<your-API-key>
export GOOGLE_API_KEY=<your-API-key>
export DEEPSEEK_API_KEY=<your-API-key>

Quick Start

Define model:

import smolmodels as sm

model = sm.Model(
    intent="Classify customer feedback as positive, negative, or neutral",
    input_schema={"text": str},
    output_schema={"sentiment": str}
)

Build and save:

# Build with existing data
model.build(dataset=pd.read_csv("feedback.csv"), provider="openai:gpt-4o-mini")

# Or generate synthetic data
model.build(generate_samples=1000)

# Save model for later use
sm.save_model(model, "sentiment_model")

Load and use:

# Load existing model
loaded_model = sm.load_model("sentiment_model")

# Make predictions
result = loaded_model.predict({"text": "Great service, highly recommend!"})
print(result["sentiment"])  # "positive"

Benchmarks

Performance evaluated on 20 OpenML benchmark datasets and 12 Kaggle competitions. Higher performance observed on 12/20 OpenML datasets, with remaining datasets showing performance within 0.005 of baseline. Experiments conducted on standard infrastructure (8 vCPUs, 30GB RAM) with 1-hour runtime limit per dataset.

Complete code and results are available at plexe-ai/plexe-results.

Documentation

For full documentation, visit docs.plexe.ai.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache-2.0 License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.15.0

Apr 15, 2025

0.14.0

Apr 10, 2025

0.13.0

Apr 4, 2025

0.12.6

Apr 4, 2025

0.12.5

Apr 3, 2025

0.12.4

Mar 27, 2025

0.12.3

Mar 27, 2025

0.12.1

Mar 26, 2025

0.12.0

Mar 19, 2025

0.11.1

Mar 15, 2025

0.11.0

Mar 10, 2025

0.10.0

Mar 9, 2025

0.9.3

Mar 4, 2025

0.9.2

Feb 25, 2025

0.9.1

Feb 25, 2025

0.9.0

Feb 21, 2025

0.8.2

Feb 21, 2025

0.8.1

Feb 21, 2025

0.8.0

Feb 20, 2025

0.7.1

Feb 20, 2025

0.7.0

Feb 15, 2025

0.6.0

Feb 12, 2025

0.5.3

Feb 11, 2025

0.5.2

Feb 9, 2025

0.5.1

Feb 8, 2025

0.5.0

Feb 4, 2025

0.4.0

Feb 4, 2025

0.3.2

Feb 1, 2025

This version

0.3.1

Feb 1, 2025

0.3.0

Feb 1, 2025

0.2.0

Jan 31, 2025

0.1.2

Jan 30, 2025

0.1.1

Jan 29, 2025

0.1.0

Jan 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smolmodels-0.3.1.tar.gz (50.3 kB view details)

Uploaded Feb 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smolmodels-0.3.1-py3-none-any.whl (73.3 kB view details)

Uploaded Feb 1, 2025 Python 3

File details

Details for the file smolmodels-0.3.1.tar.gz.

File metadata

Download URL: smolmodels-0.3.1.tar.gz
Upload date: Feb 1, 2025
Size: 50.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.12.8 Linux/6.8.0-1020-azure

File hashes

Hashes for smolmodels-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`30a73c92c3eef3730ab4b5f95f3a398508e1110284744220b83de58a5097db01`
MD5	`d9727927848ced3a702c5866c083e0ac`
BLAKE2b-256	`b2385ed12f58c55226be08ba4d81138ef66414fad74958902db5da4e1dff3441`

See more details on using hashes here.

File details

Details for the file smolmodels-0.3.1-py3-none-any.whl.

File metadata

Download URL: smolmodels-0.3.1-py3-none-any.whl
Upload date: Feb 1, 2025
Size: 73.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.12.8 Linux/6.8.0-1020-azure

File hashes

Hashes for smolmodels-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`263d408ba6ee13dc27644b63d585f7dab0010e614dad500402ee17bdabef99da`
MD5	`87c1252f0bab7df515cd48a4e72e90de`
BLAKE2b-256	`e77a66dbb7334c7c15a32be0b035c68207189a366bfd0716591375615078005b`

See more details on using hashes here.

smolmodels 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

smolmodels ✨

How Does It Work?

Key Features

📝 Natural Language Intent

🎲 Data Generation

🎯 Directives for fine-grained Control (Not Yet Implemented - Coming Soon)

✅ Optional Constraints (Not Yet Implemented - Coming Soon)

🌐 Multi-Provider Support

Installation & Setup

API Keys

Quick Start

Benchmarks

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes