Skip to main content

An agentic framework for building ML models from natural language

Project description

plexe ✨

PyPI version Discord

backed-by-yc

Build machine learning models using natural language.

Quickstart | Features | Installation | Documentation


plexe lets you create machine learning models by describing them in plain language. Simply explain what you want, provide a dataset, and the AI-powered system builds a fully functional model through an automated agentic approach. Also available as a managed cloud service.


Watch the demo on YouTube: Building an ML model with Plexe

1. Quickstart

Installation

pip install plexe
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>

Using plexe

Provide a tabular dataset (Parquet, CSV, ORC, or Avro) and a natural language intent:

python -m plexe.main \
    --train-dataset-uri data.parquet \
    --intent "predict whether a passenger was transported" \
    --max-iterations 5
from plexe.main import main
from pathlib import Path

best_solution, metrics, report = main(
    intent="predict whether a passenger was transported",
    data_refs=["train.parquet"],
    max_iterations=5,
    work_dir=Path("./workdir"),
)
print(f"Performance: {best_solution.performance:.4f}")

2. Features

2.1. 🤖 Multi-Agent Architecture

The system uses 14 specialized AI agents across a 6-phase workflow to:

  • Analyze your data and identify the ML task
  • Select the right evaluation metric
  • Search for the best model through hypothesis-driven iteration
  • Evaluate model performance and robustness
  • Package the model for deployment

2.2. 🎯 Automated Model Building

Build complete models with a single call. Plexe supports XGBoost, CatBoost, LightGBM, Keras, and PyTorch for tabular data:

best_solution, metrics, report = main(
    intent="predict house prices based on property features",
    data_refs=["housing.parquet"],
    max_iterations=10,                    # Search iterations
    allowed_model_types=["xgboost"],      # Or let plexe choose
    enable_final_evaluation=True,         # Evaluate on held-out test set
)

Run python -m plexe.main --help for all CLI options.

The output is a self-contained model package at work_dir/model/ (also archived as model.tar.gz). The package has no dependency on plexe — build the model with plexe, deploy it anywhere:

model/
├── artifacts/          # Trained model + feature pipeline (pickle)
├── src/                # Inference predictor, pipeline code, training template
├── schemas/            # Input/output JSON schemas
├── config/             # Hyperparameters
├── evaluation/         # Metrics and detailed analysis reports
├── model.yaml          # Model metadata
└── README.md           # Usage instructions with example code

2.3. 🐳 Batteries-Included Docker Images

Run plexe with everything pre-configured — PySpark, Java, and all dependencies included. A Makefile is provided for common workflows:

make build          # Build the Docker image
make test-quick     # Fast sanity check (~1 iteration)
make run-titanic    # Run on Spaceship Titanic dataset

Or run directly:

docker run --rm \
    -e OPENAI_API_KEY=$OPENAI_API_KEY \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $(pwd)/data:/data -v $(pwd)/workdir:/workdir \
    plexe:py3.12 python -m plexe.main \
        --train-dataset-uri /data/dataset.parquet \
        --intent "predict customer churn" \
        --work-dir /workdir \
        --spark-mode local

A config.yaml in the project root is automatically mounted. A Databricks Connect image is also available: docker build --target databricks .

2.4. ⚙️ YAML Configuration

Customize LLM routing, search parameters, Spark settings, and more via a config file:

# config.yaml
max_search_iterations: 5
allowed_model_types: [xgboost, catboost]
spark_driver_memory: "4g"
hypothesiser_llm: "openai/gpt-5-mini"
feature_processor_llm: "anthropic/claude-sonnet-4-5-20250929"
CONFIG_FILE=config.yaml python -m plexe.main ...

See config.yaml.template for all available options.

2.5. 🌐 Multi-Provider LLM Support

Plexe uses LLMs via LiteLLM, so you can use any supported provider:

# Route different agents to different providers
hypothesiser_llm: "openai/gpt-5-mini"
feature_processor_llm: "anthropic/claude-sonnet-4-5-20250929"
model_definer_llm: "ollama/llama3"

[!NOTE] Plexe should work with most LiteLLM providers, but we actively test only with openai/* and anthropic/* models. If you encounter issues with other providers, please let us know.

2.6. 📊 Experiment Dashboard

Visualize experiment results, search trees, and evaluation reports with the built-in Streamlit dashboard:

python -m plexe.viz --work-dir ./workdir

2.7. 🔌 Extensibility

Connect plexe to custom storage, tracking, and deployment infrastructure via the WorkflowIntegration interface:

main(intent="...", data_refs=[...], integration=MyCustomIntegration())

See plexe/integrations/base.py for the full interface.

3. Installation

3.1. Installation Options

pip install plexe                    # Core (XGBoost, CatBoost, LightGBM, Keras, PyTorch, scikit-learn)
pip install plexe[pyspark]           # + Local PySpark execution
pip install plexe[aws]               # + S3 storage support (boto3)

Requires Python >= 3.10, < 3.13.

3.2. API Keys

export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>

See LiteLLM providers for all supported providers.

4. Documentation

For full documentation, visit docs.plexe.ai.

5. Contributing

See CONTRIBUTING.md for guidelines. Join our Discord to connect with the team.

6. License

Apache-2.0 License

7. Citation

If you use Plexe in your research, please cite it as follows:

@software{plexe2025,
  author = {De Bernardi, Marcello AND Dubey, Vaibhav},
  title = {Plexe: Build machine learning models using natural language.},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/plexe-ai/plexe}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plexe-1.2.0.tar.gz (170.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plexe-1.2.0-py3-none-any.whl (221.2 kB view details)

Uploaded Python 3

File details

Details for the file plexe-1.2.0.tar.gz.

File metadata

  • Download URL: plexe-1.2.0.tar.gz
  • Upload date:
  • Size: 170.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.14.0-1017-azure

File hashes

Hashes for plexe-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5e92c7ab11bcc62fb63bacca85112503030d9c42eaee27f064c327732a2544c0
MD5 ad5055eefe9ae83679ee77303ba00075
BLAKE2b-256 ffac9b49123df0f393b19b469f1639fdaccc607f7746c97e6a27c6ea45996dc2

See more details on using hashes here.

File details

Details for the file plexe-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: plexe-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 221.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.14.0-1017-azure

File hashes

Hashes for plexe-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2df2c5c3f27626a0a9e7755ad661664148390daa057a73106b79dc22750cd3c0
MD5 1e83fa17034d0efd40ac4013ce1be592
BLAKE2b-256 dadda5b77bac35e23005fe2482f57b1f79317aee3b2e257ac7a82c7f45704805

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page