Skip to main content

A framework for building ML models from natural language

Project description

smolmodels 🤖✨

Build specialized ML models using natural language.

What is smolmodels?

smolmodels is a Python library that lets you create machine learning models by describing what you want them to do in plain English. Instead of wrestling with model architectures and hyperparameters, you simply describe your intent, define your inputs and outputs, and let smolmodels handle the rest.

from smolmodels import Model

# Create a house price predictor with just a description
model = Model(
    intent="Predict house prices based on property features",
    input_schema={
        "square_feet": float,
        "bedrooms": int,
        "location": str,
        "year_built": int
    },
    output_schema={
        "predicted_price": float
    }
)

# Build the model - optionally generate synthetic training data
model.build("house-prices.csv", generate_samples=1000)

# Make predictions
price = model.predict({
    "square_feet": 2500,
    "bedrooms": 4,
    "location": "San Francisco",
    "year_built": 1985
})

How Does It Work?

smolmodels uses a multi-step process for model creation:

  1. Intent Analysis: Problem description is analyzed to understand the type of model needed, key requirements, and success criteria.

  2. Data Generation: Smolmodels can generate synthetic data to enable model build when there is no training data available.

  3. Model Building: The library:

    • Selects appropriate model architectures
    • Handles feature engineering
    • Manages training and validation
    • Ensures outputs meets the specified constraints
  4. Validation & Refinement: The model is tested against constraints and refined using directives (like "optimize for speed" or "prioritize explainability").

Key Features

Natural Language Intent 📝

Models are defined through natural language descriptions and schema specifications, abstracting away architecture decisions.

Data Generation 🎲

Built-in synthetic data generation for training and validation.

Directives for fine-grained Control 🎯

Guide the model building process with high-level directives:

from smolmodels import Directive

model.build(directives=[
    Directive("Optimize for inference speed"),
    Directive("Prioritize interpretability")
])

Optional Constraints ✅

Optional declarative constraints for model validation:

from smolmodels import Constraint

# Ensure predictions are always positive
positive_constraint = Constraint(
    lambda inputs, outputs: outputs["predicted_price"] > 0,
    description="Predictions must be positive"
)

model = Model(
    intent="Predict house prices...",
    constraints=[positive_constraint],
    ...
)

Installation & Setup

pip install smolmodels

API Keys

Set required API keys as environment variables:

# Required for model generation
export OPENAI_API_KEY=<your-API-key>
export ANTHROPIC_API_KEY=<your-API-key>

# Required for data generation
export GOOGLE_API_KEY=<your-API-key>

Quick Start

  1. Define model:
from smolmodels import Model

model = Model(
    intent="Classify customer feedback as positive, negative, or neutral",
    input_schema={"text": str},
    output_schema={"sentiment": str}
)
  1. Build and save:
# Build with existing data
model.build(dataset="feedback.csv")

# Or generate synthetic data
model.build(generate_samples=1000)

# Save model for later use
model.save("sentiment_model")
  1. Load and use:
# Load existing model
loaded_model = Model.load("sentiment_model")

# Make predictions
result = loaded_model.predict({"text": "Great service, highly recommend!"})
print(result["sentiment"])  # "positive"

Benchmarks

Performance evaluated on 20 OpenML benchmark datasets and 12 Kaggle competitions. Higher performance observed on 12/20 OpenML datasets, with remaining datasets showing performance within 0.005 of baseline. Experiments conducted on standard infrastructure (8 vCPUs, 30GB RAM) with 1-hour runtime limit per dataset.

Complete code and results are available at plexe-ai/plexe-results.

Documentation

For full documentation, visit docs.plexe.ai.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache-2.0 License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smolmodels-0.1.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smolmodels-0.1.1-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file smolmodels-0.1.1.tar.gz.

File metadata

  • Download URL: smolmodels-0.1.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.12.8 Linux/6.8.0-1020-azure

File hashes

Hashes for smolmodels-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6db5103ca08daca16d471e14df9879541c42e89b82b4e600cbc66187e82b789b
MD5 23b54d0ee263e0c5bdd539047323af15
BLAKE2b-256 02dffeec87a2e4fb14386d41c4a8e7811f57ccddd8eec340e98849b0e1cac0b9

See more details on using hashes here.

File details

Details for the file smolmodels-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smolmodels-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.12.8 Linux/6.8.0-1020-azure

File hashes

Hashes for smolmodels-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 817b3ecb50e5a511ee8a34fe648e585b8a7034d72eb72632a9c36996c71cb5cf
MD5 32c617f621a93bbfb13cb6a85826b63b
BLAKE2b-256 59ae534edb0568a8feb4c0e7f43b428a223ddf9eae93c7c51a71f2e5892d81f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page