Skip to main content

NeuralSmith CLI - ML Engineering Made Accessible to All

Project description

NeuralSmith CLI

A standalone command-line tool for automated neural architecture search and machine learning model training.

Features

  • ๐Ÿ–ผ๏ธ Image Classification - Train models using Neural Architecture Search (14-model experiments)
  • ๐Ÿ“Š Tabular Classification & Regression - Train models on CSV data with EDA
  • ๐Ÿ“ˆ Time Series Classification & Regression - Train models on time series data
  • ๐Ÿท๏ธ Auto Labeler - Automatically label unlabeled data using weighted KNN
  • ๐Ÿค– CoPilot - Interactive AI assistant for guidance and troubleshooting

Functionality Docs

Each major functionality now has a dedicated in-package README:

  • neuralsmith/image_classification/README.md
  • neuralsmith/tabular/README.md
  • neuralsmith/timeseries/README.md
  • neuralsmith/auto_labeler/README.md
  • neuralsmith/copilot/README.md
  • neuralsmith/README.md (index + core package internals)

Installation

Prerequisites

  • Python 3.8 or higher
  • pip or pipx

### From PyPI (Future)

```bash
pipx install neuralsmith

Quick Start

1. Configure API Key (for CoPilot)

neuralsmith --config-key

Or set environment variable:

export NEURALSMITH_GEMINI_API_KEY="your-api-key-here"

2. Try CoPilot

neuralsmith copilot

Ask questions like:

  • "How do I train an image classification model?"
  • "What options does tabular-classification support?"
  • "Help me validate my dataset"

3. Train Your First Model

Image Classification

neuralsmith image-classification \
  --data-path ./my_images \
  --output-dir ./results \
  --epochs 10

Tabular Classification

neuralsmith tabular-classification \
  --data-path ./data.csv \
  --target-column species \
  --mode fast

Commands

Image Classification

Train image classification models using NAS:

neuralsmith image-classification \
  --data-path <folder> \
  --output-dir <dir> \
  [--target-size H,W] \
  [--epochs N] \
  [--val-split 0.1] \
  [--batch-size N] \
  [--learning-rate 0.001] \
  [--device cpu|cuda] \
  [--seed 42]

Example:

neuralsmith image-classification \
  --data-path ./my_images \
  --output-dir ./results \
  --target-size 128,128 \
  --epochs 100 \
  --val-split 0.2

Tabular Classification

Train classification models on CSV data:

neuralsmith tabular-classification \
  --data-path <csv> \
  --target-column <name> \
  [--output-dir <dir>] \
  [--mode fast++|fast|exhaustive] \
  [--train-percent 80.0] \
  [--val-percent 0.0] \
  [--test-percent 20.0] \
  [--no-eda]

Example:

neuralsmith tabular-classification \
  --data-path ./data.csv \
  --target-column species \
  --mode exhaustive \
  --output-dir ./results

Tabular Regression

Same as classification, but for regression tasks:

neuralsmith tabular-regression \
  --data-path <csv> \
  --target-column <name> \
  [OPTIONS]

Time Series Classification

Train classification models on time series data:

neuralsmith timeseries-classification \
  --data-path <csv> \
  [--time-column <name>] \
  --target-column <name> \
  [--window-size <n>] \
  [--mode fast|fast++|exhaustive] \
  [--split-method temporal|random] \
  [--train-percent 70.0] \
  [--val-percent 15.0] \
  [--test-percent 15.0] \
  [--random-state 42] \
  [--no-normalize] \
  [--epochs 10] \
  [--batch-size 32]

Time Series Regression

Same as classification, but for regression:

neuralsmith timeseries-regression \
  --data-path <csv> \
  [--time-column <name>] \
  --target-column <name> \
  [--window-size <n>] \
  [OPTIONS]

Notes:

  • --data-path can point to either a CSV file or a directory containing pre-windowed NumPy splits:
    • X_train.npy, y_train.npy, X_val.npy, y_val.npy, X_test.npy, y_test.npy
  • For CSV input, --time-column and --window-size are required.
  • For NumPy input, --time-column and --window-size are ignored.
  • Default split is --split-method temporal to avoid overlap leakage between train/val/test windows.

Auto Labeler

Automatically label unlabeled data:

neuralsmith auto-labeler \
  --data-path <path> \
  --data-type image|tabular|timeseries \
  --labeled-column <name> \
  --label-column <name> \
  --output-path <path> \
  [--k 5] \
  [--min-confidence 0.5]

CoPilot

Start interactive AI assistant:

neuralsmith copilot [--gemini-key <key>]

Modes (default is Ask โ€” plain chat, no autonomous tools):

Mode Flag Behavior
Ask (default) Answers questions; you run !validate, !status, !watch yourself.
Agent --agent The model can call read-only tools (inspect paths, CSV, validation, run status) and propose full neuralsmith training commands. Each training run is shown as an exact argv and runs only if you type yes.
Agent-plus (experimental) --agent-plus Same tools as Agent, but proposed training commands run without confirmation. Use only in trusted environments.

Examples:

neuralsmith copilot --agent
neuralsmith copilot --agent-plus   # experimental

In CoPilot, you can:

  • Ask questions about NeuralSmith commands
  • Get help with workflows
  • Validate datasets: !validate <path>
  • Live status for running wizards (from another terminal):
    • !status [path] - read the newest neuralsmith_run_status.json under --output-dir (or the parent folder of --output-path for auto-labeler)
    • !watch [path] - poll status every ~2s until the run is completed or failed
  • Type help for commands, exit to quit

Agent modes use the same ! commands as Ask mode. Training wizards themselves are unchanged; the agent only invokes the existing CLI in a subprocess with an allowlisted set of flags.

Agent / Agent-plus Quick Guide

Use this when you want CoPilot to help prepare and run training commands end-to-end.

Start modes

neuralsmith copilot --agent
neuralsmith copilot --agent-plus   # experimental

How --agent works (recommended default)

  1. You ask for a task (for example: "train a quick tabular classifier on this CSV").
  2. CoPilot may inspect files / validate data with read-only tools.
  3. CoPilot prints a proposed exact command, for example:
    • python -m neuralsmith tabular-classification ...
  4. Nothing runs until you confirm by typing yes.
  5. Training output streams in the same terminal.

How --agent-plus works

  • Same planning/tool behavior as --agent
  • Difference: proposed training commands run immediately without the yes confirmation step
  • Use only in trusted, local environments

What agent modes can do

  • Validate inputs with !validate <path>
  • Check run snapshot with !status [path]
  • Follow live progress with !watch [path]
  • Propose and execute existing NeuralSmith training wizards (image-classification, tabular-*, timeseries-*, auto-labeler)

Safe usage tips

  • Prefer --agent for normal use
  • Provide explicit paths and target columns in your prompt to reduce retries
  • Use --agent-plus only if you are comfortable with automatic execution

Quick test with bundled sample data

!validate tests/data/tabular_classification/iris_like_100.csv
Train a quick tabular classification model on tests/data/tabular_classification/iris_like_100.csv using target column species and mode fast.

Example (two terminals):

  1. Start a wizard with --output-dir (for example ./run_live_test)
  2. In CoPilot, run !watch ./run_live_test to keep getting posted while training runs

Common Workflows

Image Classification Workflow

  1. Prepare your images in a folder structure:

    my_images/
    โ”œโ”€โ”€ class1/
    โ”‚   โ”œโ”€โ”€ img1.jpg
    โ”‚   โ””โ”€โ”€ img2.jpg
    โ””โ”€โ”€ class2/
        โ”œโ”€โ”€ img3.jpg
        โ””โ”€โ”€ img4.jpg
    
  2. Run training:

    neuralsmith image-classification \
      --data-path ./my_images \
      --output-dir ./results \
      --epochs 50 \
      --val-split 0.2
    
  3. Check results in ./results/ directory

Tabular Classification Workflow

  1. Prepare your CSV with a target column

  2. Run EDA and training:

    neuralsmith tabular-classification \
      --data-path ./data.csv \
      --target-column target \
      --mode exhaustive \
      --output-dir ./results
    
  3. Review models in ./results/models/

Time Series Workflow

  1. Prepare CSV with time column and features

  2. Run training:

    neuralsmith timeseries-classification \
      --data-path ./timeseries.csv \
      --time-column timestamp \
      --target-column label \
      --window-size 20 \
    

--mode fast
--split-method temporal


## Using Your Trained Models

After training completes, NeuralSmith automatically generates comprehensive training summary reports and provides easy-to-use model loading utilities.

### Training Summary Report

After each training run, NeuralSmith generates a comprehensive report at:
- **Markdown Report:** `{output_dir}/training_summary_report.md`
- **JSON Report:** `{output_dir}/training_summary_report.json`

The report includes:
- **Executive Summary:** Total models trained, best model identification
- **Model Performance Comparison:** Ranked table of all models with metrics
- **Best Model Details:** Complete information about the best performing model
- **Model Usage Instructions:** Ready-to-use code examples

### Loading Models

#### Image Classification Models

```python
from neuralsmith.model_loader import load_model
import torch
import numpy as np
from PIL import Image

# Load the trained model
model = load_model('results/model_*.pth')

# Preprocess an image
image = Image.open('your_image.jpg')
image = image.resize((64, 64))  # Match your training size
img_array = np.array(image).astype(np.float32) / 255.0
img_array = np.transpose(img_array, (2, 0, 1))  # HWC -> CHW
img_tensor = torch.FloatTensor(img_array).unsqueeze(0)

# Make prediction
model.eval()
with torch.no_grad():
 prediction = model(img_tensor)
 predicted_class = torch.argmax(prediction, dim=1).item()
 probabilities = torch.softmax(prediction, dim=1)[0]

print(f'Predicted class: {predicted_class}')
print(f'Probabilities: {probabilities.numpy()}')

Tabular Classification/Regression Models

from neuralsmith.model_loader import load_model
import pandas as pd

# Load model and preprocessing pipeline
model, preprocessor = load_model('results/models/best_model_*/')

# Load your new data
new_data = pd.read_csv('new_data.csv')

# Preprocess using the same pipeline (handles imputation, scaling, feature selection)
X_processed = preprocessor.transform(new_data)

# Make predictions
predictions = model.predict(X_processed)

# For classification, get probabilities
if hasattr(model, 'predict_proba'):
    probabilities = model.predict_proba(X_processed)
    print(f'Predictions: {predictions}')
    print(f'Probabilities:\n{probabilities}')
else:
    print(f'Predictions: {predictions}')

Using CoPilot for Model Usage

After training, you can ask CoPilot for help using your models:

neuralsmith copilot

Example Questions:

  • "How do I use the model I just trained?"
  • "Generate code to load my model from results/"
  • "Show me how to make predictions on new images"
  • "How do I use my tabular model for batch predictions?"

Best Practices

  1. Always check the training summary report first for model details
  2. Use the same preprocessing that was used during training
  3. Match input shapes - especially for image models (size, channels)
  4. Handle device placement - ensure data and model are on the same device
  5. Use CoPilot for customized code generation based on your specific needs

Configuration

Configuration is stored in ~/.neuralsmith/config.json:

{
  "gemini_api_key": "your-api-key",
  "default_output_dir": "./models",
  "log_level": "INFO"
}

Environment variables (override config):

  • NEURALSMITH_GEMINI_API_KEY - Gemini API key
  • NEURALSMITH_TEST_MODE - Enable test mode (limits epochs/models)

Getting Help

Command Help

neuralsmith --help
neuralsmith image-classification --help
neuralsmith tabular-classification --help

CoPilot Assistant

neuralsmith copilot

Then type:

  • help - Show available commands
  • !validate <path> - Validate a dataset
  • Ask any question about NeuralSmith

Troubleshooting

API Key Issues

# Check if API key is set
neuralsmith --config-key

# Or use environment variable
export NEURALSMITH_GEMINI_API_KEY="your-key"

Python Environment

Make sure you have Python 3.8+:

python --version

Missing Dependencies

Install all dependencies:

pip install -e ".[copilot,auto-labeler]"

Memory Issues

For large datasets, use smaller batch sizes or reduce image sizes:

neuralsmith image-classification \
  --data-path ./large_dataset \
  --target-size 64,64 \
  --batch-size 16

Model Loading Issues

Model Not Found:

  • Check the training summary report for exact model paths
  • Verify the output directory path is correct

Shape Mismatch Errors:

  • For images: Ensure image size matches training size
  • For tabular: Ensure feature names match training features

Preprocessing Errors:

  • Load the preprocessor from the same model directory
  • Use the same preprocessing pipeline that was used during training

Ask CoPilot: If you encounter issues, ask CoPilot:

  • "Help me debug my model loading code"
  • "Why am I getting a shape mismatch error?"
  • "How do I preprocess my data correctly?"

Testing

The repository includes a comprehensive test suite in the tests/ directory.

Running Tests

Install test dependencies:

pip install -e ".[dev]"

Run all tests:

pytest tests/ -v

Run fast tests only (skip slow training tests):

pytest tests/ -v -m "not slow"

Run a specific test file:

pytest tests/test_image_classification.py -v

Test Coverage

The test suite includes:

  • CLI entry point and argument parsing tests
  • Image classification wizard tests
  • Tabular classification/regression wizard tests
  • Time series classification/regression wizard tests
  • Auto-labeler wizard tests
  • CoPilot functionality tests
  • Configuration management tests
  • Integration tests for full workflows

Test datasets are stored in tests/data/ and include small synthetic datasets for all wizard types.

Development

Setup

  1. Clone the repository (if not already done)

  2. Navigate to CLI directory:

    cd CLI
    
  3. Install in development mode:

    pip install -e ".[dev]"
    

Project Structure

CLI/
โ”œโ”€โ”€ neuralsmith/          # Main package
โ”‚   โ”œโ”€โ”€ cli.py           # CLI entry point
โ”‚   โ”œโ”€โ”€ config.py        # Configuration management
โ”‚   โ”œโ”€โ”€ model_loader.py  # Model loading utilities
โ”‚   โ”œโ”€โ”€ reporting.py     # Report generation
โ”‚   โ”œโ”€โ”€ image_classification/
โ”‚   โ”œโ”€โ”€ tabular/
โ”‚   โ”œโ”€โ”€ timeseries/
โ”‚   โ”œโ”€โ”€ auto_labeler/
โ”‚   โ””โ”€โ”€ copilot/
โ”œโ”€โ”€ Legacy_utils/         # Shared Python scripts
โ”œโ”€โ”€ scripts/             # Utility scripts
โ”œโ”€โ”€ pyproject.toml       # Package configuration
โ””โ”€โ”€ README.md            # This file

Development Workflow

  1. Make changes to code in neuralsmith/
  2. Add tests for new functionality (if applicable)
  3. Test manually to ensure everything works
  4. Update documentation in README.md if needed

Building the Package

cd CLI
python -m build

This creates dist/ with source distribution and wheel.

Code Style

  • Follow PEP 8
  • Use type hints where possible
  • Add docstrings to public functions
  • Keep functions focused and testable

Adding New Features

  1. Implement the feature in appropriate module
  2. Add CLI command in neuralsmith/cli.py
  3. Update documentation in README.md
  4. Test end-to-end with real data

Requirements

  • Python 3.8+
  • See pyproject.toml for full dependency list

License

NeuralSmith

Support

For issues and questions:

  • Use CoPilot: neuralsmith copilot
  • Check the training summary reports for model-specific guidance
  • Review this README for common workflows and troubleshooting

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralsmith_cli-1.0.0.tar.gz (145.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuralsmith_cli-1.0.0-py3-none-any.whl (97.1 kB view details)

Uploaded Python 3

File details

Details for the file neuralsmith_cli-1.0.0.tar.gz.

File metadata

  • Download URL: neuralsmith_cli-1.0.0.tar.gz
  • Upload date:
  • Size: 145.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for neuralsmith_cli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 80b570dfaa0844af3c784725edab9c60e1f775f1723a219e77c32bec8b39d0c6
MD5 0dde8be2479e32e8dc558b85fc43866f
BLAKE2b-256 e97aed31df5bf7831e9bb1423da2015a6136a9c1f369bb6174b206102675c252

See more details on using hashes here.

File details

Details for the file neuralsmith_cli-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for neuralsmith_cli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60200534239b54faafc3aee742a4658486503c5f9be81824a3a6a5b1fb0e39be
MD5 9bf779c0a347ad61b59fd6fe60e4ab41
BLAKE2b-256 0d4ca37d885680073997ae70126e05c5a872af36088d5103eb73f63e4ff107bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page