Skip to main content

Train ML models from a single command — no code required

Project description

TrainCLI

Train ML models from a single command — no code required.

TrainCLI is a command-line tool that automates the entire machine learning pipeline: data profiling, cleaning, model selection, training, and export. Perfect for rapid prototyping, baseline models, or learning ML workflows.

Features

  • 🚀 One-command trainingtrain --data data.csv --target column
  • 🤖 Auto model selection — Cross-validation picks the best algorithm
  • 🧹 Smart data cleaning — Handles missing values, outliers, encoding
  • 📊 Data profiling — Detects issues before training
  • 🔍 Leakage detection — Warns about suspiciously correlated features
  • 🧠 LLM-powered insights — Optional AI explanations (watsonx.ai)
  • 📦 Export everything — model.pkl, train.py, report.md, metrics.json
  • 🔧 Reproducible — Generated train.py is standalone

Installation

pip install traincli

Quick Start

1. Initialize (first time only)

train init

This sets up your API keys for LLM features (optional — works offline too).

2. Train a model

train --data sales.csv --target revenue

That's it! TrainCLI will:

  • Profile your data
  • Clean it automatically (or ask for input)
  • Detect if it's classification or regression
  • Try multiple models via cross-validation
  • Pick the best one
  • Export everything to ./output/

3. Use your model

import pickle
import pandas as pd

with open("output/model.pkl", "rb") as f:
    model = pickle.load(f)

new_data = pd.DataFrame([{"feature1": 10, "feature2": "A"}])
prediction = model.predict(new_data)

Or run the generated train.py to retrain from scratch.

Commands

Command Description
train --data <file> --target <col> Train a model on a CSV file
train --dir <folder> --target <col> Train on all CSVs in a folder
train --model <name> Use a specific model (e.g., random_forest)
train --model ? Pick model interactively
train --task classification Override problem type detection
train --drop col1,col2 Drop columns (e.g., to fix leakage)
train --preview Profile data without training
train --auto Skip all prompts (for CI/CD)
train init Configure API keys

Examples

Basic training

train --data customers.csv --target churn

Force a specific model

train --data sales.csv --target price --model gradient_boosting

Interactive model selection

train --data data.csv --target label --model ?

Drop leaky columns

train --data jobs.csv --target salary --drop min_salary,max_salary

Batch processing

train --dir ./datasets/ --target outcome --auto

Preview mode

train --data data.csv --target col --preview

Output Files

After training, you'll get:

  • model.pkl — Trained scikit-learn pipeline (ready to use)
  • train.py — Standalone script to retrain the model
  • report.md — Human-readable summary with metrics
  • metrics.json — Machine-readable metrics
  • session.json — Full training session metadata

Supported Models

TrainCLI includes:

Classification:

  • Logistic Regression
  • Random Forest
  • Gradient Boosting (XGBoost)
  • Support Vector Machine
  • K-Nearest Neighbors

Regression:

  • Linear Regression
  • Ridge Regression
  • Random Forest
  • Gradient Boosting (XGBoost)
  • Support Vector Regression

LLM Features (Optional)

TrainCLI can use IBM watsonx.ai to:

  • Explain model results in plain English
  • Suggest improvements
  • Generate educational comments in train.py

To enable:

train init
# Choose "watsonx" and enter your API key

Works offline if no API key is configured.

Data Leakage Detection

TrainCLI automatically checks for:

  • Features with >0.98 correlation to target
  • Column names containing the target name
  • Linear transformations of the target

Example warning:

⚠ Leakage suspected:
  → min_salary: correlation with target is 0.999 — suspiciously high
  
  Suggested fix: train --drop min_salary ...

Requirements

  • Python 3.9+
  • pandas, scikit-learn, xgboost
  • Optional: ibm-watsonx-ai (for LLM features)

Development

git clone https://github.com/yourusername/traincli
cd traincli
pip install -e .

License

MIT

Contributing

Contributions welcome! Open an issue or PR.

MCP Server (Claude Code Integration)

TrainCLI includes an MCP server that lets Claude Code train models directly:

Setup

  1. Install TrainCLI:
pip install traincli
  1. Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):
{
  "mcpServers": {
    "traincli": {
      "command": "traincli-mcp"
    }
  }
}
  1. Restart Claude Desktop

Available Tools

  • train_model — Train a model on a CSV file
  • profile_dataset — Profile data without training
  • suggest_target — Suggest the best target column

Example Usage in Claude

User: Train a model on sales.csv to predict revenue

Claude: [calls train_model tool]
✓ Model trained: Gradient Boosting
✓ R² = 0.847
✓ Files saved to ./output/

Roadmap

  • Meta-model for smarter algorithm selection
  • More preprocessing options
  • Feature engineering suggestions
  • Hyperparameter tuning

Made with ❤️ for developers who want ML without the boilerplate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traincli-0.1.3.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traincli-0.1.3-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file traincli-0.1.3.tar.gz.

File metadata

  • Download URL: traincli-0.1.3.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for traincli-0.1.3.tar.gz
Algorithm Hash digest
SHA256 468363093f6b04c6ae2c10342a517c84b00beea9f233b3c57a6467b6f3817598
MD5 5dd134c32d3cca9e01bf3360f8eb7631
BLAKE2b-256 76da2b7e2e9c262cdd47267cf0a58adaa692011b3bcf0d62d0805d12bd5253de

See more details on using hashes here.

File details

Details for the file traincli-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: traincli-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for traincli-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a342dba45d1b5dc67afe62fa2f8e133c2165ccaf99598c2e42895c8e4a951fb5
MD5 06ec7166ebb28d31ed5968f7d4d20fbb
BLAKE2b-256 713a56281d922930ab13ae59e95ecf038d94c88177edcc46f5fa1dcb9dc58f68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page