Saara: local-first CLI for dataset generation, labeling, validation, and distillation workflows.

These details have not been verified by PyPI

Project links

Project description

Saara

Saara is a local-first CLI for ML dataset workflows:

topic-to-dataset generation using Firecrawl-local research
PDF/document ingestion foundations
local model provider routing for Ollama and vLLM-compatible servers
canonical dataset examples with provenance
labeling and distillation commands
validation reports
exports to JSON, JSONL, CSV, Parquet, Arrow, and Hugging Face Dataset directories

The current implementation is an MVP scaffold intended to be extended into the full CLI.

Full planning docs:

Research artifact:

Quick Start

pip install -e .
saara splash
saara wizard
saara init
saara models health --provider ollama --model qwen
saara generate topic "robotics motion planning" --samples 20 --provider mock --format jsonl --output-dir runs/robotics
saara label .mlforge/datasets/robotics-motion-planning.jsonl --labels useful,not-useful --out labeled.jsonl
saara distill labeled.jsonl --method sft --out distilled.jsonl
saara validate .mlforge/datasets/robotics-motion-planning.jsonl

Running saara without arguments shows the splash screen and command help. Use saara wizard for the interactive guided flow, and direct subcommands for scripts or automation. Interactive sessions include terminal animations for the splash screen, menu headers, long-running operations, and completion states. Scripted or piped output automatically falls back to plain text.

Use --provider mock for deterministic local smoke tests without a running model.

Run a declarative workflow:

saara run examples/topic-dataset.json

Installation

Development install:

python3 -m venv .venv
. .venv/bin/activate
pip install -e .

Install optional dataset exporters:

pip install -e '.[data]'

Install all optional local features:

pip install -e '.[all]'

Fresh machine runtime setup:

saara doctor
saara setup docker --dry-run
saara setup ollama --dry-run
saara setup docker ollama

On Debian/Ubuntu, Saara installs Docker Engine from Docker's official apt repository. On Linux, Ollama is installed with the official Ollama installer. Review --dry-run output before running setup commands. Saara does not pull or install models automatically; choose a model based on your hardware tier.

After installation, use saara directly like a traditional CLI. The old mlforge command remains available as a compatibility alias during development.

For an isolated user-level install, use pipx once this project is published or packaged:

pipx install .

Firecrawl Local

Topic generation can use Firecrawl-local at http://localhost:3002:

saara generate topic "dataset distillation" \
  --provider ollama \
  --model qwen \
  --research firecrawl \
  --samples 100

The Firecrawl integration is exposed as a typed agent tool named firecrawl_local. The topic workflow uses a bounded ResearchAgent that calls:

firecrawl_local.search(query, limit)
firecrawl_local.scrape(url)

LangChain is not required for the core workflow. Saara uses its own small typed tool interface so Firecrawl-local calls are deterministic, auditable, and easy to test. A small adapter is included for projects that want LangChain-compatible tools via the optional saara-ai[agents] extra.

Configurable Dataset Modes

Generation can target multiple training dataset shapes:

finetuning: chat/SFT-style message examples
pretraining: plain text examples in output.text
reasoning: examples with a reasoning field
tool-calling: examples with tools and tool_calls

Most runtime and prompting behavior is user-configurable from CLI flags or workflow JSON: provider base URLs, model names, API keys, Firecrawl URL, system prompt, prompt template, temperature, max tokens, output format, and output directory. When --output-dir is used, Saara writes datasets, reports, and run artifacts into that directory.

Runtime Providers

mock: deterministic development provider
ollama: http://localhost:11434
vllm: OpenAI-compatible endpoint, default http://localhost:8000/v1

Dataset Formats

Supported exports:

json
jsonl
csv
parquet with optional pyarrow
arrow with optional pyarrow
hf with optional datasets

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.6.9

May 6, 2026

1.6.8

Apr 18, 2026

1.6.7

Mar 21, 2026

1.6.6

Mar 21, 2026

1.6.5

Mar 21, 2026

1.6.4

Jan 24, 2026

1.6.3

Jan 24, 2026

1.6.2

Jan 24, 2026

1.6.1

Jan 2, 2026

1.6.0

Dec 31, 2025

1.5.1

Dec 31, 2025

1.5.0

Dec 31, 2025

1.3.2

Dec 30, 2025

1.3.1

Dec 30, 2025

1.3.0

Dec 29, 2025

1.2.17

Dec 28, 2025

1.2.16

Dec 28, 2025

1.2.15

Dec 28, 2025

1.2.14

Dec 28, 2025

1.2.13

Dec 28, 2025

1.2.12

Dec 28, 2025

1.2.11

Dec 28, 2025

1.2.10

Dec 28, 2025

1.2.9

Dec 28, 2025

1.2.8

Dec 28, 2025

1.2.5

Dec 28, 2025

1.2.4

Dec 28, 2025

1.2.3

Dec 28, 2025

1.2.2

Dec 28, 2025

1.2.0

Dec 28, 2025

1.0.0

Dec 28, 2025

0.1.5

Apr 17, 2026

0.1.4

Apr 17, 2026

0.1.3

Apr 17, 2026

0.1.2

Apr 17, 2026

0.1.1

Apr 17, 2026

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saara_ai-1.6.9.tar.gz (44.8 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saara_ai-1.6.9-py3-none-any.whl (42.2 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file saara_ai-1.6.9.tar.gz.

File metadata

Download URL: saara_ai-1.6.9.tar.gz
Upload date: May 6, 2026
Size: 44.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for saara_ai-1.6.9.tar.gz
Algorithm	Hash digest
SHA256	`a9c49fbb1efeee844ea5b84f9ee4e696a22a2e56df18d876be3ab1923dd32972`
MD5	`8e61e5981c8619bc7a13a0227841ad83`
BLAKE2b-256	`c177a425bcb16205bcc33cc3db87c8fff976275a37f7ca6e2cdbd0c99dcdd939`

See more details on using hashes here.

File details

Details for the file saara_ai-1.6.9-py3-none-any.whl.

File metadata

Download URL: saara_ai-1.6.9-py3-none-any.whl
Upload date: May 6, 2026
Size: 42.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for saara_ai-1.6.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`efdb35e7fb33bb1305169143e763565d9a5cc09e0c5beec40ab2d784493aef33`
MD5	`0c2b2b58b0499e89e399114f10b6e8f4`
BLAKE2b-256	`2e388728f9eec7e01327a3ba1297ad5e286f40ba42967b467941abff33b5b388`

See more details on using hashes here.

saara-ai 1.6.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Saara

Quick Start

Installation

Firecrawl Local

Configurable Dataset Modes

Runtime Providers

Dataset Formats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes