Run local LLMs from Python. LangChain-compatible. llama.cpp + MLX backends.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

atharvakhaire

These details have not been verified by PyPI

Project description

openhost

Run local LLMs from Python. LangChain-compatible. No desktop app required.

openhost is a thin Python SDK that manages llama.cpp and mlx-lm servers as subprocesses, handles model downloads from HuggingFace, and plugs into LangChain like any other provider.

Install

pip install openhost

That one command pulls the bundled llama.cpp backend (llama-cpp-python) plus — on Apple Silicon — mlx-lm for native MLX inference. You can start running models immediately with no extra setup on:

macOS (Apple Silicon) — Metal GPU acceleration out of the box
Linux (CPU) — CPU baseline works
Windows (CPU) — CPU baseline works

GPU acceleration (NVIDIA / AMD)

pip can't pick the right CUDA/ROCm wheel for you at install time. After pip install openhost, run one additional line for your toolkit:

# NVIDIA CUDA 12.4
pip install --upgrade --force-reinstall llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124

# AMD ROCm 5.7
pip install --upgrade --force-reinstall llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/rocm5.7

Once that wheel is installed, openhost auto-detects the GPU and tunes -ngl (number of layers offloaded) based on available VRAM.

Optional extras

pip install 'openhost[whisper-mlx]'     # Apple Neural Engine whisper
pip install 'openhost[whisper-faster]'  # CUDA or CPU whisper (faster-whisper)

Power-user: use your own llama.cpp

If you already have an external llama-server binary on PATH, openhost prefers it over the bundled Python backend (faster startup, current llama.cpp builds). No action needed — auto-detected.

Usage

Quickest path: chat

import openhost

llm = openhost.make_chat("qwen3.6-35b-mlx-turbo", streaming=True)
for chunk in llm.stream("Write a haiku about subprocess management."):
    print(chunk.content, end="", flush=True)

That one line auto-downloads the model on first run, starts the server, picks a free port, and returns a fully-wired ChatOpenAI. No ports, no YAML, no gateway.

Model management

openhost.list_presets()                         # all built-in presets
openhost.pull("qwen3.5-35b-uncensored")         # just download
openhost.run("qwen3.5-35b-uncensored")          # start (auto-pulls if needed)
openhost.running()                              # list active runners
openhost.stop("qwen3.5-35b-uncensored")
openhost.stop_all()                             # kill everything

Any HuggingFace model — auto-detect from the repo id

If the model isn't in the built-in presets, just pass a HF repo string. OpenHost will inspect the repo, pick the right backend (GGUF → llama.cpp, safetensors → MLX), pick a quant, and register it on the fly.

# Llama 3.1 8B Q4_K_M (default quant pick) — downloads + runs in one call
llm = openhost.make_chat("bartowski/Meta-Llama-3.1-8B-Instruct-GGUF")

# Pick a specific quant
llm = openhost.make_chat("bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q5_K_M")

# MLX model on Apple Silicon
llm = openhost.make_chat("mlx-community/Qwen2.5-7B-Instruct-4bit")

# More control
from openhost import from_hf
preset = from_hf(
    "bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
    filename="Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",  # explicit file
    context_length=8192,
)

from openhost import ModelPreset, register_preset

register_preset(ModelPreset(
    id="llama-3.1-8b-instruct-q6",
    display_name="Llama 3.1 8B Instruct (Q6_K)",
    backend="llama.cpp",
    hf_repo="bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
    primary_file="Meta-Llama-3.1-8B-Instruct-Q6_K.gguf",
    command_template=(
        "llama-server", "-m", "{path}/{primary_file}",
        "-c", "{context_length}", "--host", "127.0.0.1", "--port", "{port}",
        "--jinja", "-ngl", "99", "-fa", "on",
    ),
    context_length=8192,
))

Web search (LangChain tool)

from openhost import OpenHostSearchTool

tool = OpenHostSearchTool()  # keyless DuckDuckGo by default
print(tool.invoke("macOS 26 release date"))

# Use a different provider
from openhost.search import TavilyProvider
tool = OpenHostSearchTool(provider=TavilyProvider("tvly-..."))

# Plug into a LangGraph agent
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(llm, tools=[OpenHostSearchTool()])

Transcription

import openhost

# Auto-picks mlx-whisper on Apple Silicon, faster-whisper elsewhere
result = openhost.transcribe("meeting.mp3")
print(result.text)

# As a LangChain document loader (verbose = per-segment Documents)
from openhost import OpenHostWhisper
docs = OpenHostWhisper("meeting.mp3", verbose=True).load()
for doc in docs:
    print(f"[{doc.metadata['start']:.1f}s] {doc.page_content}")

CLI

openhost list                            # show presets
openhost pull qwen3.5-35b-uncensored     # download
openhost run qwen3.5-35b-uncensored      # foreground until Ctrl-C

Built-in presets

id	backend	size
`qwen3.6-35b-mlx-turbo`	mlx-lm	~20 GB
`qwen3.5-35b-uncensored`	llama.cpp	~30 GB
`qwen3-8b-gguf`	llama.cpp	~5 GB

How it works

No HTTP gateway. make_chat() returns a ChatOpenAI pointed straight at the model's own OpenAI-compatible endpoint. Zero proxy overhead.
Automatic port allocation. Each runner picks a free localhost port. Users never touch ports.
Process-scoped lifecycle. When your Python process exits, all runners it started get cleaned up (SIGTERM on the process group, SIGKILL fallback).
Platform support. macOS + Linux. MLX is Apple Silicon only; llama.cpp is cross-platform.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

atharvakhaire

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Apr 19, 2026

0.3.0

Apr 18, 2026

0.2.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openhost-0.4.0.tar.gz (54.4 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openhost-0.4.0-py3-none-any.whl (62.0 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file openhost-0.4.0.tar.gz.

File metadata

Download URL: openhost-0.4.0.tar.gz
Upload date: Apr 19, 2026
Size: 54.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`3a757a766cdf0fd91d2f3382ff1c0fb345993bf1fd4306a3255ca8b45be07af8`
MD5	`cc2cb1fe0eec794889680d20d3fae041`
BLAKE2b-256	`7b6d4386b50f43d95b9212a2582832600db863829f51bbe74419807b63531d77`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.4.0.tar.gz:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openhost-0.4.0.tar.gz
- Subject digest: 3a757a766cdf0fd91d2f3382ff1c0fb345993bf1fd4306a3255ca8b45be07af8
- Sigstore transparency entry: 1340288513
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: atharvakhaire3443/openhost@a7fcbd01a648d321bb86ac8a597188a1a2c44a7f
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/atharvakhaire3443
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a7fcbd01a648d321bb86ac8a597188a1a2c44a7f
- Trigger Event: push

File details

Details for the file openhost-0.4.0-py3-none-any.whl.

File metadata

Download URL: openhost-0.4.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 62.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openhost-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec49de978007f5308bc88dbd819283eb8f18c7c3a8923510cd85e787fbf9a057`
MD5	`f99b8d2e03da7a0e69a8c02181a1ef48`
BLAKE2b-256	`8de40dd874981ae5612c5768703869226f74678390cb73df61bd791be3c89c6d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openhost-0.4.0-py3-none-any.whl:

Publisher: publish.yml on atharvakhaire3443/openhost

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openhost-0.4.0-py3-none-any.whl
- Subject digest: ec49de978007f5308bc88dbd819283eb8f18c7c3a8923510cd85e787fbf9a057
- Sigstore transparency entry: 1340288514
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: atharvakhaire3443/openhost@a7fcbd01a648d321bb86ac8a597188a1a2c44a7f
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/atharvakhaire3443
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a7fcbd01a648d321bb86ac8a597188a1a2c44a7f
- Trigger Event: push

openhost 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

openhost

Install

GPU acceleration (NVIDIA / AMD)

Optional extras

Power-user: use your own llama.cpp

Usage

Quickest path: chat

Model management

Any HuggingFace model — auto-detect from the repo id

Web search (LangChain tool)

Transcription

CLI

Built-in presets

How it works

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance