Skip to main content

Privacy-first local AI model builder — async DAG workflow, pluggable connectors, guided training pipeline

Project description

aimodelground

PyPI version Python 3.11+ License: Apache-2.0

Privacy-first, locally-installed ML model builder.

Upload data from any source, let the app guide you step-by-step through training, and get a deployable model — entirely on your machine. No cloud, no telemetry, no data leaving your system.


Installation

pip install aimodelground

Then install ML plugins based on your data type:

Plugin Install when you have Examples
aimodelground-classical Tabular / structured data — spreadsheets, SQL exports, CSVs with numeric/categorical columns. Best default choice. Fast, runs on any machine, no GPU needed. Customer churn, fraud detection, price prediction, sales forecasting
aimodelground-dl Images or sequences — folders of photos/scans, or time-series data where row order matters. Needs more RAM. GPU optional but speeds up training significantly. Image classification, defect detection, sensor anomaly detection, log sequence analysis
aimodelground-llm Text data — product reviews, support tickets, emails, documents. Fine-tunes an existing language model (GPT-2, Llama, Mistral) on your labels. GPU strongly recommended (8GB+ VRAM for Llama/Mistral; CPU-only works for GPT-2). Sentiment analysis, topic classification, intent detection, document routing
# Tabular data (CSV, SQL, Excel) — install this first, covers most use cases
pip install aimodelground-classical

# Image or sequential data — requires PyTorch (~2GB download)
pip install aimodelground-dl

# Text classification with LLM fine-tuning — requires PyTorch + HuggingFace (~500MB + model weights)
pip install aimodelground-llm

# Or install everything at once
pip install aimodelground-classical aimodelground-dl aimodelground-llm

Not sure? Start with aimodelground-classical. The AutoML ranker will tell you which algorithms suit your data after profiling.

Requires Python 3.11+


How it works

aimodelground runs your data through a configurable DAG pipeline with human-in-the-loop gates:

ingest → merge → validate → profile → rank_algos
                        [GATE: review data]
                                ↓
                 train_rf ──┐
                 train_xgb ─┤→ eval_join → [GATE: review results] → export → DEPLOY.md
                 train_lgb ─┘

Every step is a node in the DAG. Gates pause execution and wait for your approval. You can use the CLI (terminal-first) or the Web UI (browser-first) — both share the same project state.


Using the CLI — step by step

The CLI is the primary interface. Every action is a single command.

1. Create a project

aimodelground init my-project
cd my-project

Creates pipeline.yaml, data/raw/, .modelbuilder/config.yaml.


2. Add your data

cp customers.csv data/raw/
# or: .parquet, .json, .xlsx, .pdf, .docx

3. Configure the pipeline

Open pipeline.yaml and set:

- id: ingest
  plugin: connectors.file
  config:
    paths: ["data/raw/customers.csv"]   # ← your file

- id: train_rf
  plugin: ml.classical.random_forest
  config:
    target_col: churn                   # ← column to predict

4. Start the pipeline

aimodelground run

Runs until the first gate, prints what to do next.


5. Check progress

aimodelground status
  +  ingest          succeeded
  +  profile         succeeded
  ?  review_data     AWAITING  → aimodelground approve review_data
  .  train_rf        pending

6. Review data, then approve

# See what the profile and algorithm ranking found
cat runs/run_001/artifacts/profile.json
cat runs/run_001/artifacts/ranking.json

# Happy with data quality? Approve the gate
aimodelground approve review_data

# Resume
aimodelground run

If anything is wrong: aimodelground retry ingest to re-run from ingestion.


7. Wait for training, then review results

aimodelground status          # watch node states
aimodelground logs train_rf   # tail training log

# Once eval_join completes, review metrics
cat runs/run_001/eval_report.json

# Optionally tune hyperparameters before approving
aimodelground tune --trials 50

# Approve
aimodelground approve review_results
aimodelground run

8. Get deployment guide

aimodelground deploy

Prints the full DEPLOY.md with Python script, FastAPI endpoint, and Dockerfile.


9. Iterate

aimodelground runs                        # list all runs
aimodelground compare run_001 run_002     # diff metrics
aimodelground run --from train_rf         # re-train with new config
aimodelground models update               # update model with new data
aimodelground export --format onnx        # re-export in different format

Using the Web UI — step by step

The Web UI gives a visual view of the pipeline with live updates. Run it alongside the CLI — they share the same state.

1. Start the UI

cd my-project
aimodelground ui
# Opens http://localhost:8765

Keep this running in one terminal. Run aimodelground run in a second terminal.


2. Pipeline tab — monitor execution

  • Each node shows its current state with a color badge.
  • Nodes update live as they complete (no refresh needed).
  • If a node shows failed — click the Retry button. The node resets and will re-run next time you run aimodelground run.
  • If a gate shows awaiting — a yellow banner appears at the top with instructions. Click Approve or Skip directly in the UI.
  • After approving a gate in the UI, go back to your terminal and run aimodelground run to resume.

3. Data tab — upload files and check profile

  • Upload your data file directly from the browser (drag and drop or file picker). Files go to data/raw/.
  • After the profile node runs, this tab shows your column types, row count, and null counts.
  • Columns with >10% nulls are highlighted in orange as a warning.
  • Next steps hint on this page tells you exactly what to configure in pipeline.yaml.

4. Results tab — review model performance

  • Shows evaluation metrics (accuracy, F1, RMSE) for the current run.
  • Feature importance chart (SHAP values) shows which columns drive predictions.
  • Click a different run button at the top to switch between runs.
  • Click vs run_001 links to compare two runs side by side — green delta = improvement.
  • A What to do next panel on the right tells you the exact next action.

5. Deploy tab — get your model ready for production

  • Shows the auto-generated DEPLOY.md with ready-to-paste code.
  • Copy button copies the entire guide to clipboard.
  • Copy path copies the exported model file path.
  • Choose between three deployment options shown in the guide:
    • Python script (simplest, runs locally)
    • FastAPI REST endpoint (API server)
    • Dockerfile (containerised deployment)

Step-by-step usage (combined reference)

Step 1 — Create a project

aimodelground init my-churn-model
cd my-churn-model

This creates:

my-churn-model/
  pipeline.yaml      ← DAG definition (edit this)
  data/raw/          ← drop your data files here
  .modelbuilder/     ← project config

Step 2 — Add your data

Drop any supported file into data/raw/:

cp customers.csv my-churn-model/data/raw/
# or: .parquet, .json, .xlsx, .png folder, .wav folder

For SQL databases, S3, GCS, Kafka, REST APIs — configure the connector in pipeline.yaml (see Data connectors).


Step 3 — Configure pipeline.yaml

Open pipeline.yaml. The default template is pre-filled. You only need to set two things:

a) Point to your data:

- id: ingest
  type: task
  plugin: connectors.file
  config:
    paths: ["data/raw/customers.csv"]   # ← your file

b) Set your target column (the column you want to predict):

- id: train_rf
  type: task
  plugin: ml.classical.random_forest
  depends_on: [review_data]
  config:
    target_col: churn    # ← column name to predict

Everything else (merge, validate, profile, rank, eval, export) runs automatically.


Step 4 — Run the pipeline

Using the CLI:

aimodelground run

The pipeline starts. It will run until it hits the first review gate, then print:

GATE: review_data
   Review data profile and algorithm rankings before training
   Run: aimodelground approve review_data

Using the Web UI:

aimodelground ui
# Opens http://localhost:8765 in your browser

The Pipeline tab shows each node with a live status indicator. Nodes turn green as they complete.


Step 5 — Check what the pipeline found (first gate)

Before training starts, aimodelground profiles your data and ranks algorithms. Review what it discovered:

CLI:

aimodelground status

Output:

Pipeline: my-churn-model  run_001  4/8 nodes done

  +  ingest          succeeded
  +  merge           succeeded
  +  validate        succeeded
  +  profile         succeeded
  +  rank_algos      succeeded
  ?  review_data     AWAITING  → aimodelground approve review_data
  .  train_rf        pending
  .  train_xgb       pending

To see the full data profile and algorithm rankings:

# Check the profile saved in the run artifacts
cat runs/run_001/artifacts/profile.json

# Check which algorithms were ranked and why
cat runs/run_001/artifacts/ranking.json

Web UI: The Data tab shows your column types, null counts, and distributions. The Pipeline tab shows the ranking results inline on the rank_algos node.

If the data looks wrong (wrong types, too many nulls, wrong file loaded) — fix the issue and retry:

aimodelground retry ingest   # re-runs ingest and all downstream nodes
aimodelground run            # resumes

If everything looks good — approve the gate:

aimodelground approve review_data

Web UI: Click the Approve button on the review_data gate node.

Then resume:

aimodelground run

Step 6 — Wait for training

Training runs in parallel for all selected algorithms. Watch progress:

CLI:

aimodelground status          # check node states
aimodelground logs train_rf   # tail logs for a specific node

Web UI: The Pipeline tab updates live. Click any running node to see its log output in the side panel.

Training time depends on your data size and hardware:

  • Tabular data, 10k–100k rows: typically 30 seconds – 5 minutes
  • Images / sequences: minutes to hours depending on GPU

Step 7 — Review results (second gate)

After all models finish, the pipeline pauses again:

CLI:

aimodelground status
# shows: review_results  AWAITING

# View the eval report
cat runs/run_001/eval_report.json

Web UI: Go to the Results tab. You'll see:

  • Leaderboard table: each algorithm with accuracy, F1, RMSE
  • Feature importance chart (SHAP values)
  • Option to compare against a previous run

If results are poor:

  • Try tuning hyperparameters first: aimodelground tune --trials 50
  • Or re-run with different data: aimodelground run --from ingest
  • Or skip a poorly-performing algorithm: aimodelground skip train_xgb

When satisfied — approve:

aimodelground approve review_results
aimodelground run

Web UI: Click Approve on the review_results gate.


Step 8 — Export and deploy

After approval, the pipeline exports the best model and generates DEPLOY.md.

CLI:

aimodelground deploy
# Prints the full deployment guide with code examples

Web UI: Go to the Deploy tab. It shows:

  • Model info (algorithm, format, input schema)
  • Python inference script
  • FastAPI REST endpoint (copy-paste ready)
  • Dockerfile

By default the model exports as pickle. To export as ONNX:

# in pipeline.yaml
- id: export
  type: task
  plugin: core.export
  depends_on: [review_results]
  config:
    format: onnx     # or: pickle, safetensors

Or re-export after the fact:

aimodelground export --format onnx

The exported file is at runs/run_001/export/model.onnx (or .pkl).


Step 9 — Iterate

Compare two runs:

aimodelground compare run_001 run_002

Output:

Comparing run_001 vs run_002
 Metric    run_001    run_002    Delta
 accuracy  0.8412     0.8891    +0.0479
 f1        0.8103     0.8654    +0.0551

Replay from a specific node (e.g., re-train with different config without re-ingesting):

# Edit pipeline.yaml — change n_estimators, learning_rate, etc.
aimodelground run --from train_rf

Update an existing model with new data:

aimodelground models list
aimodelground models update run_001/random_forest --data data/raw/new_customers.csv

Common issues

Problem Fix
Node shows failed aimodelground logs <node> to see error. Fix the issue, then aimodelground retry <node>
Wrong target column Edit pipeline.yaml, set correct target_col, then aimodelground run --from train_rf
Too many nulls in data Fix source data, then aimodelground retry ingest
Training too slow Reduce dataset size for prototyping, or add GPU. For tabular data, n_estimators: 50 trains faster
Model accuracy too low Run aimodelground tune --trials 100 before the training gate, or add more data
Want to skip an algorithm aimodelground skip train_xgb — downstream nodes unblock automatically
Web UI not updating Check aimodelground run is still running in another terminal

CLI reference

Command Description
aimodelground --version Show version
aimodelground init <name> Create project
aimodelground run Start/resume pipeline
aimodelground run --from <node> Replay from node, reuse upstream
aimodelground status Show DAG node states
aimodelground approve <node> Approve a gate
aimodelground skip <node> Skip a node
aimodelground retry <node> Reset failed node
aimodelground logs <node> Show node logs
aimodelground runs List all runs
aimodelground compare <a> <b> Diff eval metrics
aimodelground tune Optuna hyperparameter search
aimodelground export [--format] Re-export model (pickle/onnx)
aimodelground deploy Print deployment guide
aimodelground ui [--port N] Open web interface
aimodelground features list List saved feature sets
aimodelground features info <n> Feature set details
aimodelground features delete <n> Delete feature set
aimodelground models list View all trained models
aimodelground models update [id] Update model with new data

Pipeline configuration (pipeline.yaml)

nodes:
  - id: ingest_csv
    type: task
    plugin: connectors.file
    config:
      paths: ["data/raw/*.csv"]

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_csv]

  - id: validate
    type: task
    plugin: validators.schema
    depends_on: [merge]
    config:
      required_columns: [age, income, label]
      max_null_pct: 0.1

  - id: profile
    type: task
    plugin: core.profile
    depends_on: [merge]

  - id: rank_algos
    type: task
    plugin: core.automl_ranker
    depends_on: [profile]

  - id: review_data
    type: gate
    depends_on: [rank_algos, validate]
    message: "Review data before training"

  - id: train_rf
    type: task
    plugin: ml.classical.random_forest
    depends_on: [review_data]
    config:
      target_col: label

  - id: train_xgb
    type: task
    plugin: ml.classical.xgboost
    depends_on: [review_data]
    config:
      target_col: label

  - id: eval_join
    type: parallel_join
    depends_on: [train_rf, train_xgb]

  - id: review_results
    type: gate
    depends_on: [eval_join]
    message: "Review results and pick model"

  - id: export
    type: task
    plugin: core.export
    depends_on: [review_results]
    config:
      format: onnx

  - id: deploy_advisor
    type: task
    plugin: core.deploy_advisor
    depends_on: [export]

Data connectors

Plugin Source
connectors.file CSV, JSON, Parquet, Excel, Arrow (DuckDB, glob patterns)
connectors.document PDF, DOCX, TXT, MD — extracts text, page numbers, char count
connectors.sql PostgreSQL, MySQL, SQLite (SQLAlchemy DSN)
connectors.rest_poll HTTP API polling
connectors.websocket WebSocket stream
connectors.kafka Kafka topic
connectors.image PNG/JPG/TIFF directory → image_path + label
connectors.audio WAV/MP3/FLAC directory → MFCC features
connectors.s3 Amazon S3 (DuckDB httpfs, IAM/keys/MinIO)
connectors.gcs Google Cloud Storage (DuckDB httpfs)
connectors.feature_store Saved feature sets

ML plugins

aimodelground-classical

pip install aimodelground-classical
Plugin Algorithm Update support
ml.classical.random_forest RandomForest warm_start
ml.classical.xgboost XGBoost incremental
ml.classical.lightgbm LightGBM incremental

All produce: accuracy/F1/RMSE, SHAP feature importance, pickle + ONNX export.

aimodelground-dl

pip install aimodelground-dl
Plugin Architecture
ml.dl.cnn_image 3-layer CNN for image classification
ml.dl.lstm_tabular 2-layer LSTM for sequential/tabular data

aimodelground-llm

pip install aimodelground-llm
Plugin Method
ml.llm.lora_text LoRA fine-tuning on GPT-2, Llama, Mistral, Phi

Core pipeline plugins

Plugin Purpose
core.merge Concat all connector outputs
core.profile Compute DataProfile (row count, column types, nulls)
validators.schema Validate required columns + null thresholds
core.automl_ranker Rank installed ML plugins by suitability
core.automl_tuner Optuna hyperparameter search (CV-based)
core.export Export best model (pickle/ONNX/safetensors)
core.deploy_advisor Generate DEPLOY.md
core.feature_store_save Save processed data as named feature set
core.model_update Update existing model with new data

Feature store

aimodelground features list
aimodelground features info <name>
aimodelground features versions <name>
aimodelground features delete <name>
# Save features in pipeline
- id: save_features
  type: task
  plugin: core.feature_store_save
  depends_on: [merge]
  config:
    feature_name: customer_features_v1

# Load in future run
- id: load_features
  type: task
  plugin: connectors.feature_store
  config:
    name: customer_features_v1

Model update

aimodelground models list
aimodelground models update --data data/raw/new.csv --target label
aimodelground models update run_001/random_forest --n-estimators 100

Working with PDF and document files

If your data is PDFs, Word documents, text files, or markdown, use connectors.document. It extracts text from each file (page-by-page for PDFs) and produces a DataFrame with filename, text, page, and char_count columns.

Step 1 — Organise your files

Option A — flat folder (all documents, no labels):

data/raw/
  contract_001.pdf
  contract_002.pdf
  report_march.docx
  notes.txt

Option B — labelled subdirectories (for classification):

data/raw/
  approved/
    doc_001.pdf
    doc_002.pdf
  rejected/
    doc_003.pdf
    doc_004.pdf

Step 2 — Configure pipeline.yaml

nodes:
  - id: ingest_docs
    type: task
    plugin: connectors.document
    config:
      paths: ["data/raw/**/*.pdf", "data/raw/**/*.docx"]
      label_from_dir: true   # set true if using labelled subdirectories

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_docs]

  - id: profile
    type: task
    plugin: core.profile
    depends_on: [merge]

  - id: rank_algos
    type: task
    plugin: core.automl_ranker
    depends_on: [profile]

  - id: review_data
    type: gate
    depends_on: [rank_algos]
    message: "Review extracted text before training"

  - id: train_lora
    type: task
    plugin: ml.llm.lora_text
    depends_on: [review_data]
    config:
      text_col: text          # column produced by the document connector
      label_col: label        # column from label_from_dir, or your own label column
      base_model: gpt2        # or: meta-llama/Llama-2-7b, mistralai/Mistral-7B-v0.1
      epochs: 3
      max_length: 512

  - id: review_results
    type: gate
    depends_on: [train_lora]
    message: "Review fine-tuning results"

  - id: export
    type: task
    plugin: core.export
    depends_on: [review_results]
    config:
      format: safetensors     # adapter weights, compatible with Ollama / vLLM

  - id: deploy_advisor
    type: task
    plugin: core.deploy_advisor
    depends_on: [export]

Step 3 — Run

pip install aimodelground-llm   # required for LLM fine-tuning

aimodelground run

The connector extracts text from every PDF/DOCX, then the LLM plugin fine-tunes a LoRA adapter on your labelled documents.

What the extracted data looks like

filename source page total_pages text char_count label
contract_001.pdf data/raw/approved/... 1 4 "This agreement..." 3420 approved
contract_001.pdf data/raw/approved/... 2 4 "Section 2..." 2870 approved

Each PDF produces one row per page. DOCX and TXT produce one row per file.

Choosing a base model

Base model When to use GPU required
gpt2 Small datasets (<1000 docs), fast iteration, CPU-friendly No (CPU works)
distilbert-base-uncased Classification tasks, small model, good accuracy No
meta-llama/Llama-2-7b Large datasets, high accuracy, production use Yes (8GB+ VRAM)
mistralai/Mistral-7B-v0.1 Best accuracy, multilingual support Yes (8GB+ VRAM)

Mixing documents with other data

You can combine document text with structured data in the same pipeline:

nodes:
  - id: ingest_docs
    type: task
    plugin: connectors.document
    config:
      paths: ["data/raw/contracts/**/*.pdf"]
      label_from_dir: true

  - id: ingest_metadata
    type: task
    plugin: connectors.file
    config:
      paths: ["data/raw/contract_metadata.csv"]

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_docs, ingest_metadata]

Versioned runs

aimodelground runs
aimodelground compare run_001 run_002
aimodelground run --from validate    # replay, reuse upstream outputs

Web UI

aimodelground ui --port 8765
  • Pipeline — live DAG, approve/skip buttons, SSE real-time updates
  • Data — file upload, schema, null stats
  • Results — leaderboard, Plotly charts, run comparison
  • Deploy — rendered deployment guide

Project structure

my-project/
  pipeline.yaml         # DAG definition
  project.db            # SQLite state
  data/raw/             # Input data
  runs/
    run_001/
      artifacts/        # Models, parquets, ranking.json
      logs/             # Node logs
      eval_report.json
      DEPLOY.md         # Deployment guide
      export/           # Exported model
  .modelbuilder/
    features/           # Feature store data
    feature_store.db

Contributing

See CONTRIBUTING.md.

Releasing

See RELEASING.md.

Changelog

See CHANGELOG.md.

License

Apache 2.0 — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimodelground-0.2.0.tar.gz (65.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aimodelground-0.2.0-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file aimodelground-0.2.0.tar.gz.

File metadata

  • Download URL: aimodelground-0.2.0.tar.gz
  • Upload date:
  • Size: 65.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aimodelground-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2a6a959dd7633276fbe57df6eedc4acb99885808f37286ac28b29af6e304f550
MD5 f333ed6f0475a1ec782cafab2c9112a9
BLAKE2b-256 d8d19594a0561477cb8df83801ccee301e4af8188737e38d0214b03bc964ab59

See more details on using hashes here.

File details

Details for the file aimodelground-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: aimodelground-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 70.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aimodelground-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82adb6e51f0dfe8e2814dd36b3da1eba143232ad9d5b9e2b821e33bb80920ed2
MD5 476c81e328523b59c4429f8c532250a3
BLAKE2b-256 120e01ebb5d72e321345fdb877483a4999022d0d27c94a4e11c8d5cb924af6ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page