Terminal toolkit for local Ollama model recommendation, benchmarking, and comparison.

These details have not been verified by PyPI

Project links

Project description

ollama-spark

ollama-spark is a terminal-first toolkit to help you pick, download, benchmark, and compare Ollama LLM models for local hardware. The project provides:

hardware detection (CPU, RAM, GPU)
a curated model catalog with metadata and task capabilities
model compatibility recommendations for common use-cases (chat, coding, instruct, vision, etc.)
an Ollama HTTP client for listing/pulling/generating
a lightweight benchmark runner (TTFT, TPS, latency) and aggregation
a CLI (ollama-spark) to run everything from your terminal

This repository is organized as a Python package and designed to be released to PyPI as ollama-spark.

Goals

Help users identify which Ollama models are compatible with their local hardware.
Provide a simple benchmark to measure real-world performance on your machine.
Make it easy to pull recommended models via the local Ollama daemon and compare model trade-offs.
Be lightweight, well-documented, and easy to extend.

Install
Quick start
Concepts
CLI reference
How the benchmark works
Project layout
Contributing
Roadmap / Next steps
License

Install

Recommended: create a virtual environment and install from the project root.

python -m venv .venv
source .venv/bin/activate
pip install -e .

If you want development dependencies (tests/lint):

pip install -e .[dev]

Notes:

The CLI assumes a running Ollama daemon for list/pull/generate operations (default address: http://127.0.0.1:11434).
On macOS with Apple Silicon you'll get MPS detection heuristics; for NVIDIA/AMD GPUs the tool uses nvidia-smi / rocm-smi / lspci where available.

Quick start

Detect hardware:

# show a friendly hardware summary
ollama-spark detect

List models available in your local Ollama daemon:

ollama-spark list-models

Get recommendations for coding tasks:

ollama-spark recommend --task coding --top-k 5

Pull a model (streams download progress from Ollama):

ollama-spark pull "llama3.1:8b"

Run a quick benchmark (TTFT, TPS, latency):

ollama-spark benchmark "llama3.1:8b" \
  --prompt "Write a short Python function that sorts a list" \
  --runs 2 --warmup 1 --timeout 60

Compare models (feature comparison + optional runtime micro-benchmark):

ollama-spark compare llama3.1:8b qwen2.5:7b --task coding --runtime \
  --prompt "Write a function to compute fibonacci numbers efficiently"

Concepts

Hardware profile: collected via ollama_spark.hardware (CPU, RAM, GPUs). This is converted into a canonical HardwareProfile used by the recommender.
Model spec: each model in the bundled data/models.yaml contains min_ram_gb, recommended_ram_gb, min_vram_gb, parameter_billions, capabilities (task scores), and tags.
CompatibilityResult: result of hardware vs model checks (Compatible / Borderline / Incompatible) with reasons and estimated memory needs.
Benchmark: the runner captures TTFT (time to first token), total latency, TPS (tokens per second), and lightweight resource samples via psutil. GPU sampling is best-effort and currently limited.

CLI reference

The package installs a console script ollama-spark with the following commands:

detect — detect and display hardware
list-models — list models available to local Ollama
recommend — recommend models for a task using your hardware
pull — pull a model (streams progress)
benchmark — run micro-benchmarks for a model
compare — feature & optional runtime comparison for 2–4 models

Run ollama-spark --help or ollama-spark <command> --help for details.

Example:

ollama-spark recommend --task instruct --top-k 5

How the benchmark works (brief)

Warmup runs (configurable) are executed first (not recorded).
Measured runs call Ollama's generate streaming endpoint and:
- record wall-clock time until the first token (TTFT)
- record total time the request takes
- sample CPU usage and resident memory periodically using psutil
- estimate tokens generated (tries to use server counts if provided; otherwise naive splitting)
After all runs the tool computes median and p95 for TTFT and TPS, median latency, error rate, and resource aggregations.

Limitations:

GPU utilization and VRAM peak require polling vendor tools (nvidia-smi, rocm-smi) — these are not yet fully implemented in the main aggregated report.
Token counting is approximate unless the Ollama server includes token counts in streaming events.
Benchmarks will be affected by other local processes and background CPU/GPU load; run them on as quiet a system as possible for repeatable results.

Project layout

Key files and directories:

ollama-spark/
├─ ollama_spark/
│  ├─ __init__.py
│  ├─ cli.py
│  ├─ hardware.py
│  ├─ models.py
│  ├─ ollama_client.py
│  ├─ registry.py
│  ├─ recommender.py
│  ├─ benchmark.py
│  └─ data/
│     └─ models.yaml
├─ tests/
└─ pyproject.toml

Contributing

I want this to be an excellent open source tool — you can help in several ways:

File issues for bugs or feature requests on the repository issue tracker.
Improve/extend the data/models.yaml catalog — accuracy of RAM/VRAM values and task scores improves recommendations dramatically.
Add tests in tests/ for:
- registry parsing and validation
- recommender ranking behavior (unit tests with several hardware profiles)
- Ollama client error handling (mock HTTP responses)
Help implement GPU metrics collection for benchmark aggregation (NVIDIA + ROCm + Apple).
Improve the streaming parsing to match your version of Ollama (event formats vary).

Before you create PRs:

Fork the repository.
Create a feature branch.
Make tests for new behavior and ensure pytest passes.
Open a PR with a clear description and link to any issues.

Development & CI

Recommended dev commands:

# run tests
pytest

# run linter (if configured)
ruff .

# run CLI locally (editable install)
python -m ollama_spark.cli detect

I will add a GitHub Actions workflow to run tests and lint on PRs and push to main once you confirm CI preferences (Ubuntu + macOS + Python 3.10–3.12 is typical).

Roadmap / Next steps

I will implement these items next (please tell me which you want prioritized):

README + LICENSE (this file + add MIT license) — done (README).
Add unit tests for registry parsing and recommender logic. (High priority)
Add CI workflow (GitHub Actions) for linting and tests. (High priority)
Implement GPU usage & VRAM sampling (NVIDIA / ROCm) in the benchmark runner. (Medium)
Improve token counting (integrate tokenizers or use server-provided token counts). (Medium)
Persist benchmark results to a small local DB and add history CLI. (Lower)
Prepare packaging and PyPI release (bump version and add release workflow). (Lower)

Tell me which 2–3 items you want me to implement next and I will continue immediately.

Security & privacy notes

The tool will talk to a local Ollama daemon only by default. It does not upload hardware information remotely.
If you decide to add remote registries or model repositories, be careful with credentials and always use secure transfer (HTTPS). I can add secure store for API tokens if needed.

License

This project is intended to be MIT-licensed (I'll add a LICENSE file with your confirmation). If you prefer a different license, tell me which one.

Contact / Maintainer

If you want me to continue I can:

add the LICENSE file,
implement tests + CI,
add GitHub Actions to run tests & lint,
prepare a PyPI-ready release and draft changelog.

Tell me which items to prioritize next and whether you want me to:

Use MIT or another license
Target a specific set of Python versions for CI
Add support for automatic model downloads (pull) after recommendations

I'll proceed once you confirm the next priorities.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 3, 2026

This version

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_spark-0.1.0.tar.gz (52.4 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollama_spark-0.1.0-py3-none-any.whl (49.1 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file ollama_spark-0.1.0.tar.gz.

File metadata

Download URL: ollama_spark-0.1.0.tar.gz
Upload date: Mar 3, 2026
Size: 52.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ollama_spark-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5e76ed23bcbfa35dd53cf073ac25f8fb3eb60fa840febf1b17265b4401f7e3f6`
MD5	`ce61528b3a3f1cd971db4513d0d40779`
BLAKE2b-256	`3baeafad4aa0df9e5d4e2a8c57ad209b9f882efffc4177857c4243d8a7df4727`

See more details on using hashes here.

File details

Details for the file ollama_spark-0.1.0-py3-none-any.whl.

File metadata

Download URL: ollama_spark-0.1.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 49.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ollama_spark-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3295a05a3dd8bf34951d9456904760a6c3b4686165a119ebd790d59bba8bf980`
MD5	`7a845d70b99dcf089a9d4e78677f9b66`
BLAKE2b-256	`7772d0f41eb6035535fc54bc2a794a7dfa6797c81275cf90bafe2dd27dc394e6`

See more details on using hashes here.

ollama-spark 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ollama-spark

Goals

Table of Contents

Install

Quick start

Concepts

CLI reference

How the benchmark works (brief)

Project layout

Contributing

Development & CI

Roadmap / Next steps

Security & privacy notes

License

Contact / Maintainer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes