Run local LLMs from Python. LangChain-compatible. llama.cpp + MLX backends.
Project description
openhost
Run local LLMs from Python. LangChain-compatible. No desktop app required.
openhost is a thin Python SDK that manages llama.cpp and mlx-lm servers as subprocesses, handles model downloads from HuggingFace, and plugs into LangChain like any other provider.
Install
pip install openhost
# Whisper backend (pick one based on your hardware)
pip install 'openhost[whisper-mlx]' # Apple Silicon (fast, Neural Engine)
pip install 'openhost[whisper-faster]' # CPU or CUDA GPUs
Runtime backends you install separately:
brew install llama.cpp # or build from source
pip install mlx-lm # Apple Silicon only
Usage
Quickest path: chat
import openhost
llm = openhost.make_chat("qwen3.6-35b-mlx-turbo", streaming=True)
for chunk in llm.stream("Write a haiku about subprocess management."):
print(chunk.content, end="", flush=True)
That one line auto-downloads the model on first run, starts the server, picks a free port, and returns a fully-wired ChatOpenAI. No ports, no YAML, no gateway.
Model management
openhost.list_presets() # all known presets
openhost.pull("qwen3.5-35b-uncensored") # just download
openhost.run("qwen3.5-35b-uncensored") # start (auto-pulls if needed)
openhost.running() # list active runners
openhost.stop("qwen3.5-35b-uncensored")
openhost.stop_all() # kill everything
Register your own model:
from openhost import ModelPreset, register_preset
register_preset(ModelPreset(
id="llama-3.1-8b-instruct-q6",
display_name="Llama 3.1 8B Instruct (Q6_K)",
backend="llama.cpp",
hf_repo="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
primary_file="Meta-Llama-3.1-8B-Instruct-Q6_K.gguf",
command_template=(
"llama-server", "-m", "{path}/{primary_file}",
"-c", "{context_length}", "--host", "127.0.0.1", "--port", "{port}",
"--jinja", "-ngl", "99", "-fa", "on",
),
context_length=8192,
))
Web search (LangChain tool)
from openhost import OpenHostSearchTool
tool = OpenHostSearchTool() # keyless DuckDuckGo by default
print(tool.invoke("macOS 26 release date"))
# Use a different provider
from openhost.search import TavilyProvider
tool = OpenHostSearchTool(provider=TavilyProvider("tvly-..."))
# Plug into a LangGraph agent
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(llm, tools=[OpenHostSearchTool()])
Transcription
import openhost
# Auto-picks mlx-whisper on Apple Silicon, faster-whisper elsewhere
result = openhost.transcribe("meeting.mp3")
print(result.text)
# As a LangChain document loader (verbose = per-segment Documents)
from openhost import OpenHostWhisper
docs = OpenHostWhisper("meeting.mp3", verbose=True).load()
for doc in docs:
print(f"[{doc.metadata['start']:.1f}s] {doc.page_content}")
CLI
openhost list # show presets
openhost pull qwen3.5-35b-uncensored # download
openhost run qwen3.5-35b-uncensored # foreground until Ctrl-C
Built-in presets
| id | backend | size |
|---|---|---|
qwen3.6-35b-mlx-turbo |
mlx-lm | ~20 GB |
qwen3.5-35b-uncensored |
llama.cpp | ~30 GB |
qwen3-8b-gguf |
llama.cpp | ~5 GB |
How it works
- No HTTP gateway.
make_chat()returns aChatOpenAIpointed straight at the model's own OpenAI-compatible endpoint. Zero proxy overhead. - Automatic port allocation. Each runner picks a free localhost port. Users never touch ports.
- Process-scoped lifecycle. When your Python process exits, all runners it started get cleaned up (SIGTERM on the process group, SIGKILL fallback).
- Platform support. macOS + Linux. MLX is Apple Silicon only; llama.cpp is cross-platform.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openhost-0.2.0.tar.gz.
File metadata
- Download URL: openhost-0.2.0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1974bea7153995d11fa01cb15049deb8d704cabd789e1cf2568bebd67564282f
|
|
| MD5 |
723d28ba90d2a142e087b58d722daa98
|
|
| BLAKE2b-256 |
2be3fa34e022fcaeb6439cce0bea6b26a632514ab5ff84044a720eed086b42e8
|
Provenance
The following attestation bundles were made for openhost-0.2.0.tar.gz:
Publisher:
publish.yml on atharvakhaire3443/openhost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openhost-0.2.0.tar.gz -
Subject digest:
1974bea7153995d11fa01cb15049deb8d704cabd789e1cf2568bebd67564282f - Sigstore transparency entry: 1338620379
- Sigstore integration time:
-
Permalink:
atharvakhaire3443/openhost@c9cee5be826d3e28f8f1abaca492114b46941b3e -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/atharvakhaire3443
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c9cee5be826d3e28f8f1abaca492114b46941b3e -
Trigger Event:
push
-
Statement type:
File details
Details for the file openhost-0.2.0-py3-none-any.whl.
File metadata
- Download URL: openhost-0.2.0-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6294d38fc405abca41cbdfa96b71c4ac05023dd659ed14099388fdc2ba141697
|
|
| MD5 |
c7962520a27f49a3816988c4b6101ba1
|
|
| BLAKE2b-256 |
ff2d958523e0eebdae4aa7e3d7f15c96fd3bb7ae31461909ddbbff4ba8b80d4f
|
Provenance
The following attestation bundles were made for openhost-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on atharvakhaire3443/openhost
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openhost-0.2.0-py3-none-any.whl -
Subject digest:
6294d38fc405abca41cbdfa96b71c4ac05023dd659ed14099388fdc2ba141697 - Sigstore transparency entry: 1338620384
- Sigstore integration time:
-
Permalink:
atharvakhaire3443/openhost@c9cee5be826d3e28f8f1abaca492114b46941b3e -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/atharvakhaire3443
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c9cee5be826d3e28f8f1abaca492114b46941b3e -
Trigger Event:
push
-
Statement type: