Skip to main content

Run local LLMs from Python. LangChain-compatible. llama.cpp + MLX backends.

Project description

openhost

Run local LLMs from Python. LangChain-compatible. No desktop app required.

openhost is a thin Python SDK that manages llama.cpp and mlx-lm servers as subprocesses, handles model downloads from HuggingFace, and plugs into LangChain like any other provider.

Install

pip install openhost

# Whisper backend (pick one based on your hardware)
pip install 'openhost[whisper-mlx]'     # Apple Silicon (fast, Neural Engine)
pip install 'openhost[whisper-faster]'  # CPU or CUDA GPUs

Runtime backends you install separately:

brew install llama.cpp        # or build from source
pip install mlx-lm            # Apple Silicon only

Usage

Quickest path: chat

import openhost

llm = openhost.make_chat("qwen3.6-35b-mlx-turbo", streaming=True)
for chunk in llm.stream("Write a haiku about subprocess management."):
    print(chunk.content, end="", flush=True)

That one line auto-downloads the model on first run, starts the server, picks a free port, and returns a fully-wired ChatOpenAI. No ports, no YAML, no gateway.

Model management

openhost.list_presets()                         # all built-in presets
openhost.pull("qwen3.5-35b-uncensored")         # just download
openhost.run("qwen3.5-35b-uncensored")          # start (auto-pulls if needed)
openhost.running()                              # list active runners
openhost.stop("qwen3.5-35b-uncensored")
openhost.stop_all()                             # kill everything

Any HuggingFace model — auto-detect from the repo id

If the model isn't in the built-in presets, just pass a HF repo string. OpenHost will inspect the repo, pick the right backend (GGUF → llama.cpp, safetensors → MLX), pick a quant, and register it on the fly.

# Llama 3.1 8B Q4_K_M (default quant pick) — downloads + runs in one call
llm = openhost.make_chat("bartowski/Meta-Llama-3.1-8B-Instruct-GGUF")

# Pick a specific quant
llm = openhost.make_chat("bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q5_K_M")

# MLX model on Apple Silicon
llm = openhost.make_chat("mlx-community/Qwen2.5-7B-Instruct-4bit")

# More control
from openhost import from_hf
preset = from_hf(
    "bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
    filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",  # explicit file
    context_length=8192,
)

Register your own model:

from openhost import ModelPreset, register_preset

register_preset(ModelPreset(
    id="llama-3.1-8b-instruct-q6",
    display_name="Llama 3.1 8B Instruct (Q6_K)",
    backend="llama.cpp",
    hf_repo="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
    primary_file="Meta-Llama-3.1-8B-Instruct-Q6_K.gguf",
    command_template=(
        "llama-server", "-m", "{path}/{primary_file}",
        "-c", "{context_length}", "--host", "127.0.0.1", "--port", "{port}",
        "--jinja", "-ngl", "99", "-fa", "on",
    ),
    context_length=8192,
))

Web search (LangChain tool)

from openhost import OpenHostSearchTool

tool = OpenHostSearchTool()  # keyless DuckDuckGo by default
print(tool.invoke("macOS 26 release date"))

# Use a different provider
from openhost.search import TavilyProvider
tool = OpenHostSearchTool(provider=TavilyProvider("tvly-..."))

# Plug into a LangGraph agent
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(llm, tools=[OpenHostSearchTool()])

Transcription

import openhost

# Auto-picks mlx-whisper on Apple Silicon, faster-whisper elsewhere
result = openhost.transcribe("meeting.mp3")
print(result.text)

# As a LangChain document loader (verbose = per-segment Documents)
from openhost import OpenHostWhisper
docs = OpenHostWhisper("meeting.mp3", verbose=True).load()
for doc in docs:
    print(f"[{doc.metadata['start']:.1f}s] {doc.page_content}")

CLI

openhost list                            # show presets
openhost pull qwen3.5-35b-uncensored     # download
openhost run qwen3.5-35b-uncensored      # foreground until Ctrl-C

Built-in presets

id backend size
qwen3.6-35b-mlx-turbo mlx-lm ~20 GB
qwen3.5-35b-uncensored llama.cpp ~30 GB
qwen3-8b-gguf llama.cpp ~5 GB

How it works

  • No HTTP gateway. make_chat() returns a ChatOpenAI pointed straight at the model's own OpenAI-compatible endpoint. Zero proxy overhead.
  • Automatic port allocation. Each runner picks a free localhost port. Users never touch ports.
  • Process-scoped lifecycle. When your Python process exits, all runners it started get cleaned up (SIGTERM on the process group, SIGKILL fallback).
  • Platform support. macOS + Linux. MLX is Apple Silicon only; llama.cpp is cross-platform.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openhost-0.3.0.tar.gz (44.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openhost-0.3.0-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file openhost-0.3.0.tar.gz.

File metadata

  • Download URL: openhost-0.3.0.tar.gz
  • Upload date:
  • Size: 44.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7b372d1e6cc0cbd4532796d9bb59e1641ffbd47e7091a772076a671a6ecd811a
MD5 3bc978034583d7fb2c82799f426e61eb
BLAKE2b-256 77b64667ec839cc87f57d81df8ae517ef89171265db231ebf1476aba9246668d

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.3.0.tar.gz:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openhost-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: openhost-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 52.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a18ef24d246b7209f3540653c1b590a8555ef1b45decfa9099bf256d7c4a5149
MD5 5251ed768beed2d83f65bb0e55b07a03
BLAKE2b-256 61bdb8107800ede9b87590fc6b0c90186b815fe96d82e9a74d8c23e1f0772f89

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.3.0-py3-none-any.whl:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page