Skip to main content

Run local LLMs from Python. LangChain-compatible. llama.cpp + MLX backends.

Project description

openhost

Run local LLMs from Python. LangChain-compatible. No desktop app required.

openhost is a thin Python SDK that manages llama.cpp and mlx-lm servers as subprocesses, handles model downloads from HuggingFace, and plugs into LangChain like any other provider.

Install

pip install openhost

# Whisper backend (pick one based on your hardware)
pip install 'openhost[whisper-mlx]'     # Apple Silicon (fast, Neural Engine)
pip install 'openhost[whisper-faster]'  # CPU or CUDA GPUs

Runtime backends you install separately:

brew install llama.cpp        # or build from source
pip install mlx-lm            # Apple Silicon only

Usage

Quickest path: chat

import openhost

llm = openhost.make_chat("qwen3.6-35b-mlx-turbo", streaming=True)
for chunk in llm.stream("Write a haiku about subprocess management."):
    print(chunk.content, end="", flush=True)

That one line auto-downloads the model on first run, starts the server, picks a free port, and returns a fully-wired ChatOpenAI. No ports, no YAML, no gateway.

Model management

openhost.list_presets()                         # all known presets
openhost.pull("qwen3.5-35b-uncensored")         # just download
openhost.run("qwen3.5-35b-uncensored")          # start (auto-pulls if needed)
openhost.running()                              # list active runners
openhost.stop("qwen3.5-35b-uncensored")
openhost.stop_all()                             # kill everything

Register your own model:

from openhost import ModelPreset, register_preset

register_preset(ModelPreset(
    id="llama-3.1-8b-instruct-q6",
    display_name="Llama 3.1 8B Instruct (Q6_K)",
    backend="llama.cpp",
    hf_repo="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
    primary_file="Meta-Llama-3.1-8B-Instruct-Q6_K.gguf",
    command_template=(
        "llama-server", "-m", "{path}/{primary_file}",
        "-c", "{context_length}", "--host", "127.0.0.1", "--port", "{port}",
        "--jinja", "-ngl", "99", "-fa", "on",
    ),
    context_length=8192,
))

Web search (LangChain tool)

from openhost import OpenHostSearchTool

tool = OpenHostSearchTool()  # keyless DuckDuckGo by default
print(tool.invoke("macOS 26 release date"))

# Use a different provider
from openhost.search import TavilyProvider
tool = OpenHostSearchTool(provider=TavilyProvider("tvly-..."))

# Plug into a LangGraph agent
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(llm, tools=[OpenHostSearchTool()])

Transcription

import openhost

# Auto-picks mlx-whisper on Apple Silicon, faster-whisper elsewhere
result = openhost.transcribe("meeting.mp3")
print(result.text)

# As a LangChain document loader (verbose = per-segment Documents)
from openhost import OpenHostWhisper
docs = OpenHostWhisper("meeting.mp3", verbose=True).load()
for doc in docs:
    print(f"[{doc.metadata['start']:.1f}s] {doc.page_content}")

CLI

openhost list                            # show presets
openhost pull qwen3.5-35b-uncensored     # download
openhost run qwen3.5-35b-uncensored      # foreground until Ctrl-C

Built-in presets

id backend size
qwen3.6-35b-mlx-turbo mlx-lm ~20 GB
qwen3.5-35b-uncensored llama.cpp ~30 GB
qwen3-8b-gguf llama.cpp ~5 GB

How it works

  • No HTTP gateway. make_chat() returns a ChatOpenAI pointed straight at the model's own OpenAI-compatible endpoint. Zero proxy overhead.
  • Automatic port allocation. Each runner picks a free localhost port. Users never touch ports.
  • Process-scoped lifecycle. When your Python process exits, all runners it started get cleaned up (SIGTERM on the process group, SIGKILL fallback).
  • Platform support. macOS + Linux. MLX is Apple Silicon only; llama.cpp is cross-platform.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openhost-0.2.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openhost-0.2.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file openhost-0.2.0.tar.gz.

File metadata

  • Download URL: openhost-0.2.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1974bea7153995d11fa01cb15049deb8d704cabd789e1cf2568bebd67564282f
MD5 723d28ba90d2a142e087b58d722daa98
BLAKE2b-256 2be3fa34e022fcaeb6439cce0bea6b26a632514ab5ff84044a720eed086b42e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.2.0.tar.gz:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openhost-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: openhost-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6294d38fc405abca41cbdfa96b71c4ac05023dd659ed14099388fdc2ba141697
MD5 c7962520a27f49a3816988c4b6101ba1
BLAKE2b-256 ff2d958523e0eebdae4aa7e3d7f15c96fd3bb7ae31461909ddbbff4ba8b80d4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.2.0-py3-none-any.whl:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page