End-to-end ML pipeline: PDF→JSON datasets, synthetic data, fine-tuning, evaluation, and edge deployment with Gradio GUI

These details have not been verified by PyPI

Project links

Project description

SAARA

SAARA is a Python package for building local LLM workflows around data creation, fine-tuning, evaluation, export, and visualization.

It is aimed at a practical teacher-student workflow:

extract text from PDFs or raw text
generate synthetic supervision with a stronger local model
fine-tune a smaller model
evaluate quality, speed, memory, and optional teacher agreement
export and quantize for deployment
inspect the workflow through CLI or Gradio

Features

PDF to text extraction with PyMuPDF and pdfplumber
synthetic dataset generation for factual, reasoning, conversational, instruction, code, and creative tasks
local provider abstraction for Ollama, vLLM, and llama.cpp
fine-tuning helpers for LoRA and QLoRA workflows
evaluation helpers for custom datasets, benchmarks, and teacher-student comparison
export helpers for safetensors, GGUF, AWQ, GPTQ, ONNX, TensorRT, and PyTorch formats
Gradio dashboard and CLI entrypoints

Install

From PyPI:

pip install saara-ai

Common installs:

pip install "saara-ai[training]"
pip install "saara-ai[export]"
pip install "saara-ai[providers,training,evaluation,export]"

From source:

git clone https://github.com/nikhil49023/saara-ai.git
cd saara-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Optional extras:

pip install -e ".[dev]"
pip install -e ".[edge]"
pip install -e ".[providers,training,evaluation,export]"

Base saara-ai intentionally keeps the install lighter. Heavy native stacks such as auto-gptq, autoawq, llama-cpp-python, vllm, and training toolchains are opt-in extras.

Note: a virtual environment is strongly recommended.

Quick Start

1. Start a local model provider

For Ollama:

ollama pull mistral
ollama serve

2. Build a dataset from a PDF

from saara import DatasetBuilder
from saara.dataset.types import DataType
from saara.providers.ollama_provider import OllamaProvider, ProviderConfig

provider = OllamaProvider(
    ProviderConfig(model="mistral", base_url="http://localhost:11434")
)

builder = DatasetBuilder(provider)
samples = builder.from_pdf(
    "document.pdf",
    data_types=[DataType.INSTRUCTION, DataType.FACTUAL],
    pairs_per_type=5,
    min_quality=0.65,
)

builder.save(samples, "dataset.jsonl")

3. Launch the GUI

saara gui

Core Workflow

Dataset creation

from saara import DatasetBuilder
from saara.dataset.types import DataType

builder = DatasetBuilder(provider)
samples = builder.from_text(
    "Transformers use attention mechanisms for sequence modeling.",
    data_types=[DataType.FACTUAL, DataType.INSTRUCTION],
    pairs_per_type=3,
)

Fine-tuning

from saara import FineTuner
from saara.training.config import TrainingConfig

config = TrainingConfig(
    model_name="mistralai/Mistral-7B-v0.1",
    num_train_epochs=3,
    use_lora=True,
)

finetuner = FineTuner(config)
finetuner.train("dataset.jsonl")
finetuner.save("./output/models/my-finetune")

Evaluation

from saara import ModelEvaluator
from saara.providers.ollama_provider import OllamaProvider, ProviderConfig

student = OllamaProvider(ProviderConfig(model="mistral-7b-finetuned"))
teacher = OllamaProvider(ProviderConfig(model="llama-3-70b"))

evaluator = ModelEvaluator(student, teacher)
metrics = evaluator.evaluate(
    "test.jsonl",
    run_benchmarks=True,
    benchmark_names=["mmlu", "gsm8k"],
)

print(metrics.summary())

Export and quantization

from saara import ModelExporter
from saara.export.formats import ExportFormat
from saara.export.quantization import QuantizationConfig

exporter = ModelExporter(
    model="./output/models/my-finetune",
    config=QuantizationConfig(bits=4),
)

results = exporter.export(
    "./output/exports",
    formats=[ExportFormat.GGUF, ExportFormat.AWQ, ExportFormat.ONNX],
    quantize=True,
)

CLI

saara dataset from-pdf document.pdf -o ./output --data-types instruction factual
saara train finetune dataset.jsonl --model mistralai/Mistral-7B-v0.1 --epochs 3
saara eval model test.jsonl --model mistral --teacher llama-3-70b --benchmarks mmlu gsm8k
saara export model ./models/final --formats gguf awq --quantize --bits 4
saara gui --port 7860

Documentation

User guide: USER_GUIDE.md
Package structure: PACKAGE_STRUCTURE.md
Examples: examples/

Examples

examples/01_pdf_to_dataset.py
examples/02_synthetic_data.py
examples/03_finetune.py
examples/04_evaluate.py
examples/05_export.py
examples/06_complete_pipeline.py
examples/07_gui.py
examples/08_saara_demo_notebook.ipynb
examples/09_edge_ai_solutions_gemma_vllm.ipynb
examples/10_finetune_workflow_notebook.ipynb
examples/11_evaluation_workflow_notebook.ipynb
examples/12_export_workflow_notebook.ipynb
examples/13_gui_workflow_notebook.ipynb

Compatibility imports for older notebooks are also supported:

from saara.core import DatasetBuilder, ProviderConfig
from saara.enums import DataType, DatasetFormat
from saara.providers import VLLMProvider, DemoProvider

Recommended Starting Point

If you're trying SAARA for the first time, start with:

OllamaProvider
one small PDF or a short text file
LoRA or QLoRA fine-tuning
GGUF and safetensors export first

Development

pytest tests/
black saara/
ruff check saara/
mypy saara/

PyPI

https://pypi.org/project/saara-ai/

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.6.9

May 6, 2026

1.6.8

Apr 18, 2026

1.6.7

Mar 21, 2026

1.6.6

Mar 21, 2026

1.6.5

Mar 21, 2026

1.6.4

Jan 24, 2026

1.6.3

Jan 24, 2026

1.6.2

Jan 24, 2026

1.6.1

Jan 2, 2026

1.6.0

Dec 31, 2025

1.5.1

Dec 31, 2025

1.5.0

Dec 31, 2025

1.3.2

Dec 30, 2025

1.3.1

Dec 30, 2025

1.3.0

Dec 29, 2025

1.2.17

Dec 28, 2025

1.2.16

Dec 28, 2025

1.2.15

Dec 28, 2025

1.2.14

Dec 28, 2025

1.2.13

Dec 28, 2025

1.2.12

Dec 28, 2025

1.2.11

Dec 28, 2025

1.2.10

Dec 28, 2025

1.2.9

Dec 28, 2025

1.2.8

Dec 28, 2025

1.2.5

Dec 28, 2025

1.2.4

Dec 28, 2025

1.2.3

Dec 28, 2025

1.2.2

Dec 28, 2025

1.2.0

Dec 28, 2025

1.0.0

Dec 28, 2025

This version

0.1.5

Apr 17, 2026

0.1.4

Apr 17, 2026

0.1.3

Apr 17, 2026

0.1.2

Apr 17, 2026

0.1.1

Apr 17, 2026

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saara_ai-0.1.5.tar.gz (49.6 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saara_ai-0.1.5-py3-none-any.whl (47.1 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file saara_ai-0.1.5.tar.gz.

File metadata

Download URL: saara_ai-0.1.5.tar.gz
Upload date: Apr 17, 2026
Size: 49.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for saara_ai-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`573c19f435d935e55ba718f4a5f1738b811a794467d9f9c89a50409b6fbf26cd`
MD5	`fc0d2e6848b899baa79d028cc332f174`
BLAKE2b-256	`49b8d7dc369d7419b432818469332c43e0b096b2a296ff90c476904f974745f1`

See more details on using hashes here.

File details

Details for the file saara_ai-0.1.5-py3-none-any.whl.

File metadata

Download URL: saara_ai-0.1.5-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 47.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for saara_ai-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9bf813a78f8e8fd26a50ba053be0455743a4c3508a2188519382f8a6a8bc98c`
MD5	`9e14d06c398b845aed78886b6f0775cc`
BLAKE2b-256	`88338aef7125d644bdaf5bb04dbe1067a4ba79f5fcb942142f57548e20b01490`

See more details on using hashes here.

saara-ai 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SAARA

Features

Install

Quick Start

1. Start a local model provider

2. Build a dataset from a PDF

3. Launch the GUI

Core Workflow

Dataset creation

Fine-tuning

Evaluation

Export and quantization

CLI

Documentation

Examples

Recommended Starting Point

Development

PyPI

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes