Privacy-first local AI model builder — async DAG workflow, pluggable connectors, guided training pipeline

These details have not been verified by PyPI

Project links

Project description

aimodelground

Privacy-first, locally-installed ML model builder.

Upload data from any source, let the app guide you step-by-step through training, and get a deployable model — entirely on your machine. No cloud, no telemetry, no data leaving your system.

Installation

pip install aimodelground

Upgrading from a previous version:

# Upgrade to latest
pip install --upgrade aimodelground

# Pin to a specific version
pip install "aimodelground==0.3.0"

Note: pip install aimodelground without flags will print "Requirement already satisfied" if any version is already installed and will NOT upgrade. Use --upgrade or pin the version explicitly.

Then install ML plugins based on your data type:

Plugin	Install when you have	Examples
`aimodelground-classical`	Tabular / structured data — spreadsheets, SQL exports, CSVs with numeric/categorical columns. Best default choice. Fast, runs on any machine, no GPU needed.	Customer churn, fraud detection, price prediction, sales forecasting
`aimodelground-dl`	Images or sequences — folders of photos/scans, or time-series data where row order matters. Needs more RAM. GPU optional but speeds up training significantly.	Image classification, defect detection, sensor anomaly detection, log sequence analysis
`aimodelground-llm`	Text data — product reviews, support tickets, emails, documents. Fine-tunes an existing language model (GPT-2, Llama, Mistral) on your labels. GPU strongly recommended (8GB+ VRAM for Llama/Mistral; CPU-only works for GPT-2).	Sentiment analysis, topic classification, intent detection, document routing

# Tabular data (CSV, SQL, Excel) — install this first, covers most use cases
pip install aimodelground-classical

# Image or sequential data — requires PyTorch (~2GB download)
pip install aimodelground-dl

# Text classification with LLM fine-tuning — requires PyTorch + HuggingFace (~500MB + model weights)
pip install aimodelground-llm

# Or install everything at once
pip install aimodelground-classical aimodelground-dl aimodelground-llm

Not sure? Start with aimodelground-classical. The AutoML ranker will tell you which algorithms suit your data after profiling.

Requires Python 3.11+

How it works

aimodelground runs your data through a configurable DAG pipeline with human-in-the-loop gates:

ingest → merge → validate → profile → rank_algos
                        [GATE: review data]
                                ↓
                 train_rf ──┐
                 train_xgb ─┤→ eval_join → [GATE: review results] → export → DEPLOY.md
                 train_lgb ─┘

Every step is a node in the DAG. Gates pause execution and wait for your approval. You can use the CLI (terminal-first) or the Web UI (browser-first) — both share the same project state.

Using the CLI — step by step

The CLI is the primary interface. Every action is a single command.

1. Create a project

aimodelground init my-project
cd my-project

Creates pipeline.yaml, data/raw/, .modelbuilder/config.yaml.

2. Add your data

cp customers.csv data/raw/
# or: .parquet, .json, .xlsx, .pdf, .docx

3. Configure the pipeline

Open pipeline.yaml and set:

- id: ingest
  plugin: connectors.file
  config:
    paths: ["data/raw/customers.csv"]   # ← your file

- id: train_rf
  plugin: ml.classical.random_forest
  config:
    target_col: churn                   # ← column to predict

4. Start the pipeline

aimodelground run

Runs until the first gate, prints what to do next.

5. Check progress

aimodelground status

  +  ingest          succeeded
  +  profile         succeeded
  ?  review_data     AWAITING  → aimodelground approve review_data
  .  train_rf        pending

6. Review data, then approve

# See what the profile and algorithm ranking found
cat runs/run_001/artifacts/profile.json
cat runs/run_001/artifacts/ranking.json

# Happy with data quality? Approve the gate
aimodelground approve review_data

# Resume
aimodelground run

If anything is wrong: aimodelground retry ingest to re-run from ingestion.

7. Wait for training, then review results

aimodelground status          # watch node states
aimodelground logs train_rf   # tail training log

# Once eval_join completes, review metrics
cat runs/run_001/eval_report.json

# Optionally tune hyperparameters before approving
aimodelground tune --trials 50

# Approve
aimodelground approve review_results
aimodelground run

8. Get deployment guide

aimodelground deploy

Prints the full DEPLOY.md with Python script, FastAPI endpoint, and Dockerfile.

9. Iterate

aimodelground runs                        # list all runs
aimodelground compare run_001 run_002     # diff metrics
aimodelground run --from train_rf         # re-train with new config
aimodelground models update               # update model with new data
aimodelground export --format onnx        # re-export in different format

Using the Web UI — step by step

The Web UI is a guided 6-step wizard. From v0.3.0 you can run the entire pipeline (upload → train → deploy → query) without touching the terminal.

cd my-project
aimodelground ui
# Opens http://localhost:8765

The wizard stepper at the top tracks your progress. Completed steps are clickable (green ✓). Steps unlock as you complete each stage.

 ✓ Upload  →  ✓ Configure  →  ▶ Run  →  · Results  →  · Deploy  →  · Query

Step 1 — Upload

Drag and drop your data file, or click the upload zone to browse.

┌─────────────────────────────────────────────────────────┐
│  Upload Data                                            │
│  Drop a file to get started — CSV, JSON, Parquet...    │
├─────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────┐  │
│  │                                                  │  │
│  │              ⇩  Drop file here                   │  │
│  │         or click to browse                       │  │
│  │    CSV · JSON · Parquet · Excel · PDF · DOCX     │  │
│  └──────────────────────────────────────────────────┘  │
│                                                         │
│  Files in data/raw/  (1 file)                          │
│  ┌──────────────────────────────────────────────────┐  │
│  │ 📄 iris.csv               24.1 KB      ready     │  │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Files land in data/raw/. Move to Configure once your file appears in the list.

Step 2 — Configure

The left pane auto-detects your file's columns. The right pane shows live YAML that updates as you change the form.

┌─────────────────────────────────────────────────────────────────────┐
│  pipeline.yaml                              [Validate]  [Save]      │
├──────────────────────────────┬──────────────────────────────────────┤
│  DATA FILE                   │  Live YAML                           │
│  ▾ iris.csv                  │  nodes:                              │
│    150 rows · 5 cols         │    - id: ingest_files                │
│                              │      plugin: connectors.file         │
│  TARGET COLUMN               │      config:                         │
│  ▾ species (categorical)     │        paths: ["data/raw/iris.csv"]  │
│                              │        target_col: "species"         │
│  ALGORITHMS                  │                                      │
│  [✓ RandomForest] [✓ XGBoost]│    - id: validate                   │
│  [ LightGBM    ] [ LSTM    ] │      plugin: validators.schema       │
│                              │      depends_on: [ingest_files]      │
│  TASK TYPE                   │                                      │
│  [✓ Classification] [Regress]│    - id: review_data                 │
│                              │      type: gate                      │
└──────────────────────────────┴──────────────────────────────────────┘

You can edit the YAML directly too — form and YAML stay in sync. Click Save when done.

Step 3 — Run

Click Run Pipeline — no terminal needed. The pipeline runs in the background with live node updates.

┌─────────────────────────────┐  ┌─────────────────────────────────┐
│  Pipeline Control           │  │  Nodes                          │
│                             │  │                                 │
│  [▶ Run Pipeline] [From: ▾] │  │  ▓ DONE   ingest_files         │
│                             │  │           connectors.file       │
│  Progress                   │  │                                 │
│  ████████░░░░░░  3/8 nodes  │  │  ▓ DONE   validate             │
│                             │  │           validators.schema     │
│  ┌─────────────────────┐   │  │                                 │
│  │ ⏳ Gate: review_data│   │  │  ⏳ GATE  review_data           │
│  │ Review data profile  │   │  │           awaiting approval     │
│  │ before training.    │   │  │                                 │
│  │ [✓ Approve] [Skip]  │   │  │  ·  PEND  profile              │
│  └─────────────────────┘   │  │  ·  PEND  rank_algos           │
│                             │  │  ·  PEND  export_model         │
└─────────────────────────────┘  └─────────────────────────────────┘

Gate cards appear automatically for nodes that need your review. Click Approve to continue — the pipeline resumes without restarting.

Step 4 — Results

Metric summary cards at the top, feature importance bars below. Compare runs side by side.

┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   94.20%     │  │    0.9412    │  │    0.9780    │
│   ACCURACY   │  │   F1 SCORE   │  │     AUC      │
│  ↑ +2.1%     │  │  ↑ +0.018   │  │  — baseline  │
└──────────────┘  └──────────────┘  └──────────────┘

Feature Importance (SHAP)
  petal_length  ████████████████████████████  0.912
  petal_width   ████████████████████          0.782
  sepal_length  ████████████                  0.421
  sepal_width   ██████                        0.213

Switch between runs using the selector at the top. Click vs run_001 to diff two runs with coloured deltas (green = improvement).

Step 5 — Deploy

Auto-generated deployment guide with copy buttons. Links directly to the Query step.

┌────────────────────────────────────┐  ┌─────────────────────┐
│  DEPLOY.md — run_003    [Copy]     │  │  Export Info        │
│                                    │  │  Algorithm: RF      │
│  ## Option 1 — Python             │  │  Format:  pickle    │
│                                    │  │  runs/.../model.pkl │
│  import joblib                     │  │  [Copy path]        │
│  model = joblib.load("model.pkl")  │  ├─────────────────────┤
│  pred = model.predict([features])  │  │  Quick Actions      │
│                                    │  │  [Query Model →]    │
│  ## Option 2 — FastAPI            │  │  [View Metrics]     │
│  ...                               │  │  [Back to Pipeline] │
└────────────────────────────────────┘  └─────────────────────┘

Step 6 — Query

Two tabs: Predict (run inference) and Explain (SHAP insights). No external API or LLM required — everything runs locally from your exported model.

Predict tab — type feature values and get an instant prediction:

┌──────────────────────────────────────────────────┐
│  🎯 Predict  |  🔍 Explain                       │
├──────────────────────────────────────────────────┤
│  Enter feature values                            │
│                                                  │
│  sepal_length  [5.1    ]   sepal_width  [3.5   ] │
│  petal_length  [1.4    ]   petal_width  [0.2   ] │
│                                                  │
│  [Predict →]  [Clear]                            │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │  setosa                  Confidence: 99%    │ │
│  │  Top driver: petal_length = 1.4             │ │
│  └─────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

Explain tab — reads SHAP values, metrics, and profile from run artifacts:

METRICS
  accuracy     0.9420
  f1           0.9412

FEATURE IMPORTANCE (SHAP)
  petal_length  ████████████████████  0.912
  petal_width   ████████████████      0.782

INSIGHTS
  💡 'petal_length' dominates predictions (score 0.91) — model may overfit.

Theme

The UI ships with a Deep Space dark theme and supports light mode. Click the ☀ Light button in the top bar to toggle — preference is saved in localStorage.

┌─────────────────────────────────────────────────────────┐
│  model-builder  v0.3.0     ● live  my-project  ☀ Light │
│ ─────────────────────────────────────────────────────── │
│  ✓ Upload  →  ✓ Configure  →  ▶ Run  →  · Results ...  │
└─────────────────────────────────────────────────────────┘

Dark (default): #0a0e1a background, #4f8ef7 accent. Light: white background, #2563eb accent.

Step-by-step usage (combined reference)

Step 1 — Create a project

aimodelground init my-churn-model
cd my-churn-model

This creates:

my-churn-model/
  pipeline.yaml      ← DAG definition (edit this)
  data/raw/          ← drop your data files here
  .modelbuilder/     ← project config

Step 2 — Add your data

Drop any supported file into data/raw/:

cp customers.csv my-churn-model/data/raw/
# or: .parquet, .json, .xlsx, .png folder, .wav folder

For SQL databases, S3, GCS, Kafka, REST APIs — configure the connector in pipeline.yaml (see Data connectors).

Step 3 — Configure `pipeline.yaml`

Using the Web UI (recommended): Go to the Configure step. The form auto-detects your file's columns and pre-fills the target column dropdown. Select your target, choose algorithms, and click Save — the YAML is written for you.

Using the CLI: Open pipeline.yaml. The default template is pre-filled. You only need to set two things:

a) Point to your data:

- id: ingest
  type: task
  plugin: connectors.file
  config:
    paths: ["data/raw/customers.csv"]   # ← your file

b) Set your target column (the column you want to predict):

- id: train_rf
  type: task
  plugin: ml.classical.random_forest
  depends_on: [review_data]
  config:
    target_col: churn    # ← column name to predict

Everything else (merge, validate, profile, rank, eval, export) runs automatically.

Step 4 — Run the pipeline

Using the CLI:

aimodelground run

The pipeline starts. It will run until it hits the first review gate, then print:

GATE: review_data
   Review data profile and algorithm rankings before training
   Run: aimodelground approve review_data

Using the Web UI:

aimodelground ui
# Opens http://localhost:8765 in your browser

Go to the Run step (step 3 in the wizard). Click Run Pipeline — the pipeline starts immediately, no terminal needed. Nodes update live as they complete.

Step 5 — Check what the pipeline found (first gate)

Before training starts, aimodelground profiles your data and ranks algorithms. Review what it discovered:

CLI:

aimodelground status

Output:

Pipeline: my-churn-model  run_001  4/8 nodes done

  +  ingest          succeeded
  +  merge           succeeded
  +  validate        succeeded
  +  profile         succeeded
  +  rank_algos      succeeded
  ?  review_data     AWAITING  → aimodelground approve review_data
  .  train_rf        pending
  .  train_xgb       pending

To see the full data profile and algorithm rankings:

# Check the profile saved in the run artifacts
cat runs/run_001/artifacts/profile.json

# Check which algorithms were ranked and why
cat runs/run_001/artifacts/ranking.json

Web UI: The Data tab shows your column types, null counts, and distributions. The Pipeline tab shows the ranking results inline on the rank_algos node.

If the data looks wrong (wrong types, too many nulls, wrong file loaded) — fix the issue and retry:

aimodelground retry ingest   # re-runs ingest and all downstream nodes
aimodelground run            # resumes

If everything looks good — approve the gate:

aimodelground approve review_data

Web UI: Click the Approve button on the review_data gate node.

Then resume:

aimodelground run

Step 6 — Wait for training

Training runs in parallel for all selected algorithms. Watch progress:

CLI:

aimodelground status          # check node states
aimodelground logs train_rf   # tail logs for a specific node

Web UI: The Pipeline tab updates live. Click any running node to see its log output in the side panel.

Training time depends on your data size and hardware:

Tabular data, 10k–100k rows: typically 30 seconds – 5 minutes
Images / sequences: minutes to hours depending on GPU

Step 7 — Review results (second gate)

After all models finish, the pipeline pauses again:

CLI:

aimodelground status
# shows: review_results  AWAITING

# View the eval report
cat runs/run_001/eval_report.json

Web UI: Go to the Results tab. You'll see:

Leaderboard table: each algorithm with accuracy, F1, RMSE
Feature importance chart (SHAP values)
Option to compare against a previous run

If results are poor:

Try tuning hyperparameters first: aimodelground tune --trials 50
Or re-run with different data: aimodelground run --from ingest
Or skip a poorly-performing algorithm: aimodelground skip train_xgb

When satisfied — approve:

aimodelground approve review_results
aimodelground run

Web UI: Click Approve on the review_results gate.

Step 8 — Export and deploy

After approval, the pipeline exports the best model and generates DEPLOY.md.

CLI:

aimodelground deploy
# Prints the full deployment guide with code examples

Web UI: Go to the Deploy tab. It shows:

Model info (algorithm, format, input schema)
Python inference script
FastAPI REST endpoint (copy-paste ready)
Dockerfile

By default the model exports as pickle. To export as ONNX:

# in pipeline.yaml
- id: export
  type: task
  plugin: core.export
  depends_on: [review_results]
  config:
    format: onnx     # or: pickle, safetensors

Or re-export after the fact:

aimodelground export --format onnx

The exported file is at runs/run_001/export/model.onnx (or .pkl).

Step 9 — Iterate

Compare two runs:

aimodelground compare run_001 run_002

Output:

Comparing run_001 vs run_002
 Metric    run_001    run_002    Delta
 accuracy  0.8412     0.8891    +0.0479
 f1        0.8103     0.8654    +0.0551

Replay from a specific node (e.g., re-train with different config without re-ingesting):

# Edit pipeline.yaml — change n_estimators, learning_rate, etc.
aimodelground run --from train_rf

Update an existing model with new data:

aimodelground models list
aimodelground models update run_001/random_forest --data data/raw/new_customers.csv

Common issues

Problem	Fix
Node shows `failed`	`aimodelground logs <node>` to see error. Fix the issue, then `aimodelground retry <node>`
Wrong target column	Edit `pipeline.yaml`, set correct `target_col`, then `aimodelground run --from train_rf`
Too many nulls in data	Fix source data, then `aimodelground retry ingest`
Training too slow	Reduce dataset size for prototyping, or add GPU. For tabular data, `n_estimators: 50` trains faster
Model accuracy too low	Run `aimodelground tune --trials 100` before the training gate, or add more data
Want to skip an algorithm	`aimodelground skip train_xgb` — downstream nodes unblock automatically
Web UI not updating	Check `aimodelground run` is still running in another terminal

CLI reference

Command	Description
`aimodelground --version`	Show version
`aimodelground init <name>`	Create project
`aimodelground run`	Start/resume pipeline
`aimodelground run --from <node>`	Replay from node, reuse upstream
`aimodelground status`	Show DAG node states
`aimodelground approve <node>`	Approve a gate
`aimodelground skip <node>`	Skip a node
`aimodelground retry <node>`	Reset failed node
`aimodelground logs <node>`	Show node logs
`aimodelground runs`	List all runs
`aimodelground compare <a> <b>`	Diff eval metrics
`aimodelground tune`	Optuna hyperparameter search
`aimodelground export [--format]`	Re-export model (pickle/onnx)
`aimodelground deploy`	Print deployment guide
`aimodelground ui [--port N]`	Open web interface
`aimodelground features list`	List saved feature sets
`aimodelground features info <n>`	Feature set details
`aimodelground features delete <n>`	Delete feature set
`aimodelground models list`	View all trained models
`aimodelground models update [id]`	Update model with new data

Pipeline configuration (`pipeline.yaml`)

nodes:
  - id: ingest_csv
    type: task
    plugin: connectors.file
    config:
      paths: ["data/raw/*.csv"]

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_csv]

  - id: validate
    type: task
    plugin: validators.schema
    depends_on: [merge]
    config:
      required_columns: [age, income, label]
      max_null_pct: 0.1

  - id: profile
    type: task
    plugin: core.profile
    depends_on: [merge]

  - id: rank_algos
    type: task
    plugin: core.automl_ranker
    depends_on: [profile]

  - id: review_data
    type: gate
    depends_on: [rank_algos, validate]
    message: "Review data before training"

  - id: train_rf
    type: task
    plugin: ml.classical.random_forest
    depends_on: [review_data]
    config:
      target_col: label

  - id: train_xgb
    type: task
    plugin: ml.classical.xgboost
    depends_on: [review_data]
    config:
      target_col: label

  - id: eval_join
    type: parallel_join
    depends_on: [train_rf, train_xgb]

  - id: review_results
    type: gate
    depends_on: [eval_join]
    message: "Review results and pick model"

  - id: export
    type: task
    plugin: core.export
    depends_on: [review_results]
    config:
      format: onnx

  - id: deploy_advisor
    type: task
    plugin: core.deploy_advisor
    depends_on: [export]

Data connectors

Plugin	Source
`connectors.file`	CSV, JSON, Parquet, Excel, Arrow (DuckDB, glob patterns)
`connectors.document`	PDF, DOCX, TXT, MD — extracts text, page numbers, char count
`connectors.sql`	PostgreSQL, MySQL, SQLite (SQLAlchemy DSN)
`connectors.rest_poll`	HTTP API polling
`connectors.websocket`	WebSocket stream
`connectors.kafka`	Kafka topic
`connectors.image`	PNG/JPG/TIFF directory → image_path + label
`connectors.audio`	WAV/MP3/FLAC directory → MFCC features
`connectors.s3`	Amazon S3 (DuckDB httpfs, IAM/keys/MinIO)
`connectors.gcs`	Google Cloud Storage (DuckDB httpfs)
`connectors.feature_store`	Saved feature sets

ML plugins

aimodelground-classical

pip install aimodelground-classical

Plugin	Algorithm	Update support
`ml.classical.random_forest`	RandomForest	warm_start
`ml.classical.xgboost`	XGBoost	incremental
`ml.classical.lightgbm`	LightGBM	incremental

All produce: accuracy/F1/RMSE, SHAP feature importance, pickle + ONNX export.

aimodelground-dl

pip install aimodelground-dl

Plugin	Architecture
`ml.dl.cnn_image`	3-layer CNN for image classification
`ml.dl.lstm_tabular`	2-layer LSTM for sequential/tabular data

aimodelground-llm

pip install aimodelground-llm

Plugin	Method
`ml.llm.lora_text`	LoRA fine-tuning on GPT-2, Llama, Mistral, Phi

Core pipeline plugins

Plugin	Purpose
`core.merge`	Concat all connector outputs
`core.profile`	Compute DataProfile (row count, column types, nulls)
`validators.schema`	Validate required columns + null thresholds
`core.automl_ranker`	Rank installed ML plugins by suitability
`core.automl_tuner`	Optuna hyperparameter search (CV-based)
`core.export`	Export best model (pickle/ONNX/safetensors)
`core.deploy_advisor`	Generate DEPLOY.md
`core.feature_store_save`	Save processed data as named feature set
`core.model_update`	Update existing model with new data

Feature store

aimodelground features list
aimodelground features info <name>
aimodelground features versions <name>
aimodelground features delete <name>

# Save features in pipeline
- id: save_features
  type: task
  plugin: core.feature_store_save
  depends_on: [merge]
  config:
    feature_name: customer_features_v1

# Load in future run
- id: load_features
  type: task
  plugin: connectors.feature_store
  config:
    name: customer_features_v1

Model update

aimodelground models list
aimodelground models update --data data/raw/new.csv --target label
aimodelground models update run_001/random_forest --n-estimators 100

Working with PDF and document files

If your data is PDFs, Word documents, text files, or markdown, use connectors.document. It extracts text from each file (page-by-page for PDFs) and produces a DataFrame with filename, text, page, and char_count columns.

Step 1 — Organise your files

Option A — flat folder (all documents, no labels):

data/raw/
  contract_001.pdf
  contract_002.pdf
  report_march.docx
  notes.txt

Option B — labelled subdirectories (for classification):

data/raw/
  approved/
    doc_001.pdf
    doc_002.pdf
  rejected/
    doc_003.pdf
    doc_004.pdf

Step 2 — Configure `pipeline.yaml`

nodes:
  - id: ingest_docs
    type: task
    plugin: connectors.document
    config:
      paths: ["data/raw/**/*.pdf", "data/raw/**/*.docx"]
      label_from_dir: true   # set true if using labelled subdirectories

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_docs]

  - id: profile
    type: task
    plugin: core.profile
    depends_on: [merge]

  - id: rank_algos
    type: task
    plugin: core.automl_ranker
    depends_on: [profile]

  - id: review_data
    type: gate
    depends_on: [rank_algos]
    message: "Review extracted text before training"

  - id: train_lora
    type: task
    plugin: ml.llm.lora_text
    depends_on: [review_data]
    config:
      text_col: text          # column produced by the document connector
      label_col: label        # column from label_from_dir, or your own label column
      base_model: gpt2        # or: meta-llama/Llama-2-7b, mistralai/Mistral-7B-v0.1
      epochs: 3
      max_length: 512

  - id: review_results
    type: gate
    depends_on: [train_lora]
    message: "Review fine-tuning results"

  - id: export
    type: task
    plugin: core.export
    depends_on: [review_results]
    config:
      format: safetensors     # adapter weights, compatible with Ollama / vLLM

  - id: deploy_advisor
    type: task
    plugin: core.deploy_advisor
    depends_on: [export]

Step 3 — Run

pip install aimodelground-llm   # required for LLM fine-tuning

aimodelground run

The connector extracts text from every PDF/DOCX, then the LLM plugin fine-tunes a LoRA adapter on your labelled documents.

What the extracted data looks like

filename	source	page	total_pages	text	char_count	label
contract_001.pdf	data/raw/approved/...	1	4	"This agreement..."	3420	approved
contract_001.pdf	data/raw/approved/...	2	4	"Section 2..."	2870	approved

Each PDF produces one row per page. DOCX and TXT produce one row per file.

Choosing a base model

Base model	When to use	GPU required
`gpt2`	Small datasets (<1000 docs), fast iteration, CPU-friendly	No (CPU works)
`distilbert-base-uncased`	Classification tasks, small model, good accuracy	No
`meta-llama/Llama-2-7b`	Large datasets, high accuracy, production use	Yes (8GB+ VRAM)
`mistralai/Mistral-7B-v0.1`	Best accuracy, multilingual support	Yes (8GB+ VRAM)

Mixing documents with other data

You can combine document text with structured data in the same pipeline:

nodes:
  - id: ingest_docs
    type: task
    plugin: connectors.document
    config:
      paths: ["data/raw/contracts/**/*.pdf"]
      label_from_dir: true

  - id: ingest_metadata
    type: task
    plugin: connectors.file
    config:
      paths: ["data/raw/contract_metadata.csv"]

  - id: merge
    type: task
    plugin: core.merge
    depends_on: [ingest_docs, ingest_metadata]

Versioned runs

aimodelground runs
aimodelground compare run_001 run_002
aimodelground run --from validate    # replay, reuse upstream outputs

Web UI

aimodelground ui --port 8765

6-step wizard. No terminal needed for basic use from v0.3.0.

Step	URL	What it does
Upload	`/upload`	Drag-drop data files, see file list
Configure	`/configure`	Smart form + live YAML editor, auto-detects columns
Run	`/`	Run button, live node list, gate approval, progress bar
Results	`/results`	Metric cards, SHAP bars, Plotly chart, run comparison
Deploy	`/deploy`	Deployment guide, export info, copy buttons
Query	`/query`	Predict tab (model inference) + Explain tab (SHAP + insights)

Dark theme default, light mode toggle (preference stored in browser). See the Web UI walkthrough above for screenshots.

Project structure

my-project/
  pipeline.yaml         # DAG definition
  project.db            # SQLite state
  data/raw/             # Input data
  runs/
    run_001/
      artifacts/        # Models, parquets, ranking.json
      logs/             # Node logs
      eval_report.json
      DEPLOY.md         # Deployment guide
      export/           # Exported model
  .modelbuilder/
    features/           # Feature store data
    feature_store.db

Contributing

See CONTRIBUTING.md.

Releasing

See RELEASING.md.

Changelog

See CHANGELOG.md.

License

Apache 2.0 — see LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

May 21, 2026

0.2.0

May 21, 2026

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aimodelground-0.3.0.tar.gz (88.1 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aimodelground-0.3.0-py3-none-any.whl (84.6 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file aimodelground-0.3.0.tar.gz.

File metadata

Download URL: aimodelground-0.3.0.tar.gz
Upload date: May 21, 2026
Size: 88.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aimodelground-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`4257f2b32236aace3e7c4e43747fdc6f0add73903e6e36da7912bf9d4382da24`
MD5	`85f2083870176d782049156511db64a3`
BLAKE2b-256	`3f68bd14c31423b639e89e3d4c2546569f97421a9dcbd031b7c0ff04c4080b73`

See more details on using hashes here.

File details

Details for the file aimodelground-0.3.0-py3-none-any.whl.

File metadata

Download URL: aimodelground-0.3.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 84.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aimodelground-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c86e3c720b7e3dcb2c013ff8c00ae89aa841c8af9d1d56ae8541edd3fa56559`
MD5	`9d8753aab9d043b3846be653d3ec2d38`
BLAKE2b-256	`50efb6b721585d6d5965572f3248735cc15f7bc533cad2b0e8da2c2c836ab13a`

See more details on using hashes here.

aimodelground 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

aimodelground

Installation

How it works

Using the CLI — step by step

1. Create a project

2. Add your data

3. Configure the pipeline

4. Start the pipeline

5. Check progress

6. Review data, then approve

7. Wait for training, then review results

8. Get deployment guide

9. Iterate

Using the Web UI — step by step

Step 1 — Upload

Step 2 — Configure

Step 3 — Run

Step 4 — Results

Step 5 — Deploy

Step 6 — Query

Theme

Step-by-step usage (combined reference)

Step 1 — Create a project

Step 2 — Add your data

Step 3 — Configure pipeline.yaml

Step 4 — Run the pipeline

Step 5 — Check what the pipeline found (first gate)

Step 6 — Wait for training

Step 7 — Review results (second gate)

Step 8 — Export and deploy

Step 9 — Iterate

Common issues

CLI reference

Pipeline configuration (pipeline.yaml)

Data connectors

ML plugins

aimodelground-classical

aimodelground-dl

aimodelground-llm

Core pipeline plugins

Feature store

Model update

Working with PDF and document files

Step 1 — Organise your files

Step 2 — Configure pipeline.yaml

Step 3 — Run

What the extracted data looks like

Choosing a base model

Mixing documents with other data

Versioned runs

Web UI

Project structure

Contributing

Releasing

Changelog

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

Step 3 — Configure `pipeline.yaml`

Pipeline configuration (`pipeline.yaml`)

Step 2 — Configure `pipeline.yaml`