A toolkit for managing and testing LM Studio models with automatic context limit discovery
Project description
lmstrix
Manage, test, and run local language models through LM Studio from the command line. Its centrepiece is a binary-search algorithm that finds the true maximum context window of any model — so you stop guessing and stop crashing.
Background: what a context window is
A language model can only "see" a fixed number of tokens at once. This is its context window. Feed it more tokens than it can handle and it either crashes, silently truncates your input, or runs out of GPU memory mid-inference.
Every model has a theoretical maximum stated in its documentation. That number is often optimistic. The real limit depends on your hardware, quantisation level, and the LM Studio version. The only way to know for certain is to test it.
What lmstrix does
- Scans your LM Studio models directory and builds a registry of available models
- Tests models using binary search to find their actual maximum context window
- Persists the results to a JSON registry so you never re-test a model you already know
- Runs inference via LM Studio's local API with configurable prompts and context sizes
- Reports test results and model metadata in formatted terminal tables
Install
pip install lmstrix
# or
uv pip install lmstrix
Requires LM Studio installed and running on localhost:1234 (the default).
Quick start
# Discover all models in your LM Studio directory
lmstrix scan
# List discovered models and their tested context limits
lmstrix list
# Find the true context limit for a specific model
lmstrix test "llama-3.2-3b-instruct"
# Run inference at a specific context size
lmstrix infer "llama-3.2-3b-instruct" --prompt "Explain quantum entanglement" --context 8192
How the context test works
Testing all possible context sizes would take hours. Binary search cuts that down to logarithmic time.
- Start with the model's stated maximum (e.g. 131072 tokens).
- Try loading the model at that size and running two simple inference checks: "Write 'ninety-six' as a number" and "2+3=".
- If it succeeds, record that size as the working maximum.
- If it fails (OOM, crash, timeout, zero tokens returned), halve the search space.
- Repeat until the boundary is found within a small tolerance.
The test uses dual prompts because a single "say hello" prompt can succeed even when the model is misconfigured — it is too short to stress the context allocation. The two prompts require the model to produce specific, verifiable output.
Results include time-to-first-token (TTFT) and tokens-per-second (TPS) from the successful test run.
The model registry
Scan results and test results are persisted to a JSON file (default: ~/.local/share/lmstrix/models.json on Linux, similar paths on macOS/Windows). Subsequent scan runs update the registry without discarding test results. list reads from the registry without touching LM Studio.
CLI reference
lmstrix scan Scan LM Studio models directory and update registry
lmstrix list List all models with context limits and test status
lmstrix test <model-id> Binary-search for true maximum context window
lmstrix infer <model-id> Run inference; options: --prompt, --context, --max-tokens
Python API
from lmstrix.api import LMStudioClient
from lmstrix.core.context_tester import ContextTester
from lmstrix.core.scanner import ModelScanner
client = LMStudioClient()
scanner = ModelScanner()
registry = scanner.scan()
tester = ContextTester(client=client, verbose=True)
model = registry.get_model("llama-3.2-3b-instruct")
updated_model = tester.test_model(model, max_context=32768, registry=registry)
print(f"Max working context: {updated_model.tested_max_context}")
print(f"TTFT: {updated_model.ttft_seconds:.2f}s")
print(f"TPS: {updated_model.tps:.1f}")
LM Studio setup
LM Studio must be running with its local API server enabled (Settings → Local Server → Start Server). The default address is http://localhost:1234. Set LMSTUDIO_BASE_URL to override.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmstrix-1.0.82.tar.gz.
File metadata
- Download URL: lmstrix-1.0.82.tar.gz
- Upload date:
- Size: 110.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e3c812b8e04e3829a900fef6176ea8770554580b0ab2e720451f617f98a2e11
|
|
| MD5 |
38c3b3235ea247d2fddc9617011605b5
|
|
| BLAKE2b-256 |
ae48a48d9c6d06d5e92def65700fcefd0bc4a0bcc3a52902bf82e38b9b7d39ba
|
File details
Details for the file lmstrix-1.0.82-py3-none-any.whl.
File metadata
- Download URL: lmstrix-1.0.82-py3-none-any.whl
- Upload date:
- Size: 115.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5d7cc7cf130912c182ee38af40409e3b06b33227bec0efa975c3df968ec3300
|
|
| MD5 |
baab49802d30e7b41dd4af5fe16c0ca9
|
|
| BLAKE2b-256 |
87891509295845952908f7661d72ccebe8471f8db1f8ef7d19ddf1a04a41323d
|