Skip to main content

A toolkit for managing and testing LM Studio models with automatic context limit discovery

Project description

lmstrix

Manage, test, and run local language models through LM Studio from the command line. Its centrepiece is a binary-search algorithm that finds the true maximum context window of any model — so you stop guessing and stop crashing.

Background: what a context window is

A language model can only "see" a fixed number of tokens at once. This is its context window. Feed it more tokens than it can handle and it either crashes, silently truncates your input, or runs out of GPU memory mid-inference.

Every model has a theoretical maximum stated in its documentation. That number is often optimistic. The real limit depends on your hardware, quantisation level, and the LM Studio version. The only way to know for certain is to test it.

What lmstrix does

  • Scans your LM Studio models directory and builds a registry of available models
  • Tests models using binary search to find their actual maximum context window
  • Persists the results to a JSON registry so you never re-test a model you already know
  • Runs inference via LM Studio's local API with configurable prompts and context sizes
  • Reports test results and model metadata in formatted terminal tables

Install

pip install lmstrix
# or
uv pip install lmstrix

Requires LM Studio installed and running on localhost:1234 (the default).

Quick start

# Discover all models in your LM Studio directory
lmstrix scan

# List discovered models and their tested context limits
lmstrix list

# Find the true context limit for a specific model
lmstrix test "llama-3.2-3b-instruct"

# Run inference at a specific context size
lmstrix infer "llama-3.2-3b-instruct" --prompt "Explain quantum entanglement" --context 8192

How the context test works

Testing all possible context sizes would take hours. Binary search cuts that down to logarithmic time.

  1. Start with the model's stated maximum (e.g. 131072 tokens).
  2. Try loading the model at that size and running two simple inference checks: "Write 'ninety-six' as a number" and "2+3=".
  3. If it succeeds, record that size as the working maximum.
  4. If it fails (OOM, crash, timeout, zero tokens returned), halve the search space.
  5. Repeat until the boundary is found within a small tolerance.

The test uses dual prompts because a single "say hello" prompt can succeed even when the model is misconfigured — it is too short to stress the context allocation. The two prompts require the model to produce specific, verifiable output.

Results include time-to-first-token (TTFT) and tokens-per-second (TPS) from the successful test run.

The model registry

Scan results and test results are persisted to a JSON file (default: ~/.local/share/lmstrix/models.json on Linux, similar paths on macOS/Windows). Subsequent scan runs update the registry without discarding test results. list reads from the registry without touching LM Studio.

CLI reference

lmstrix scan              Scan LM Studio models directory and update registry
lmstrix list              List all models with context limits and test status
lmstrix test <model-id>   Binary-search for true maximum context window
lmstrix infer <model-id>  Run inference; options: --prompt, --context, --max-tokens

Python API

from lmstrix.api import LMStudioClient
from lmstrix.core.context_tester import ContextTester
from lmstrix.core.scanner import ModelScanner

client = LMStudioClient()
scanner = ModelScanner()
registry = scanner.scan()

tester = ContextTester(client=client, verbose=True)
model = registry.get_model("llama-3.2-3b-instruct")
updated_model = tester.test_model(model, max_context=32768, registry=registry)

print(f"Max working context: {updated_model.tested_max_context}")
print(f"TTFT: {updated_model.ttft_seconds:.2f}s")
print(f"TPS: {updated_model.tps:.1f}")

LM Studio setup

LM Studio must be running with its local API server enabled (Settings → Local Server → Start Server). The default address is http://localhost:1234. Set LMSTUDIO_BASE_URL to override.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmstrix-1.0.82.tar.gz (110.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmstrix-1.0.82-py3-none-any.whl (115.4 kB view details)

Uploaded Python 3

File details

Details for the file lmstrix-1.0.82.tar.gz.

File metadata

  • Download URL: lmstrix-1.0.82.tar.gz
  • Upload date:
  • Size: 110.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lmstrix-1.0.82.tar.gz
Algorithm Hash digest
SHA256 9e3c812b8e04e3829a900fef6176ea8770554580b0ab2e720451f617f98a2e11
MD5 38c3b3235ea247d2fddc9617011605b5
BLAKE2b-256 ae48a48d9c6d06d5e92def65700fcefd0bc4a0bcc3a52902bf82e38b9b7d39ba

See more details on using hashes here.

File details

Details for the file lmstrix-1.0.82-py3-none-any.whl.

File metadata

  • Download URL: lmstrix-1.0.82-py3-none-any.whl
  • Upload date:
  • Size: 115.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lmstrix-1.0.82-py3-none-any.whl
Algorithm Hash digest
SHA256 e5d7cc7cf130912c182ee38af40409e3b06b33227bec0efa975c3df968ec3300
MD5 baab49802d30e7b41dd4af5fe16c0ca9
BLAKE2b-256 87891509295845952908f7661d72ccebe8471f8db1f8ef7d19ddf1a04a41323d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page