Process manager for model inference backends (llama.cpp, HuggingFace)
Project description
Solar Host
A multi-backend process manager for model inference servers with REST API and WebSocket log streaming.
Features
- Multi-Backend Support:
- llama.cpp (llama-server) for GGUF models
- HuggingFace AutoModelForCausalLM for text generation
- HuggingFace AutoModelForSequenceClassification for classification
- HuggingFace AutoModel for embeddings (last hidden state with mean pooling)
- Socket.IO control client - Connects to solar-control’s
/hostsnamespace for registration, heartbeat, and instance lifecycle (start/stop/restart, config updates). Supports pending-host and rejection events with post-approval sync. - Robust instance lifecycle - Non-blocking process wait, state re-check after startup to avoid start/stop races, and full cleanup of log/state buffers on stop or delete.
- Auto-assign ports starting from 3500
- Persistent configuration with auto-restart on boot
- Real-time log streaming via WebSocket
- REST API for instance management
- API key authentication
Installation
# Basic install (llama.cpp backend only)
pip install solar-host
# With HuggingFace backend support
pip install solar-host[huggingface]
# With NVIDIA GPU monitoring
pip install solar-host[nvidia]
# Everything
pip install solar-host[all]
# Development (editable install with test dependencies)
pip install -e ".[all,dev]"
Backend-Specific Requirements
For llama.cpp backend:
- Install
llama-serverand ensure it's in your PATH
For HuggingFace backends:
- Install with the
huggingfaceextra:pip install solar-host[huggingface]
Setup
1. Create .env file
Create a .env file in the solar-host/ directory:
API_KEY=your-secret-key-here
HOST=0.0.0.0
PORT=8001
MODELS_DIR=./models
# Solar-control connection (for Socket.IO registration and lifecycle)
SOLAR_CONTROL_URL=http://localhost:8000
SOLAR_CONTROL_API_KEY=your-solar-control-management-api-key
- API_KEY - Used by solar-control (and other callers) to access this host’s REST API.
- MODELS_DIR - Path to the models directory. Used for disk space reporting in the
/healthendpoint. Defaults to./models. - SOLAR_CONTROL_URL - Base URL of solar-control (HTTP; Socket.IO connects to the same origin).
- SOLAR_CONTROL_API_KEY - Management API key from solar-control. The host uses it to connect to the
/hostsnamespace; it must be approved via the management API or WebUI before it appears in the gateway pool.
2. Start the server
# Start the server (reads HOST and PORT from .env)
solar-host
# Or with uvicorn directly (e.g. for --reload during development)
uvicorn solar_host.main:app --host 0.0.0.0 --port 8001 --reload
The server will:
- Create
config.jsonautomatically (if it doesn't exist) - Create
logs/directory for instance logs - Auto-restart any instances that were running before shutdown
3. Verify it's running
curl http://localhost:8001/health
# Should return: {"status":"healthy","service":"solar-host","version":"2.0.0","disk":{"total_gb":500,"used_gb":120,"available_gb":380}}
4. Access Swagger UI
Open your browser to: http://localhost:8001/docs
- Click the "Authorize" button
- Enter your API key from
.envfile - Click "Authorize" and then "Close"
- Now you can use the interactive API documentation!
Backend Types
Solar Host supports four backend types:
| Backend Type | Model Type | Endpoints Supported |
|---|---|---|
llamacpp |
GGUF models via llama-server | /v1/chat/completions, /v1/completions |
huggingface_causal |
HuggingFace AutoModelForCausalLM | /v1/chat/completions, /v1/completions |
huggingface_classification |
HuggingFace AutoModelForSequenceClassification | /v1/classify |
huggingface_embedding |
HuggingFace AutoModel (last hidden state) | /v1/embeddings |
Managing Instances
Creating a llama.cpp Instance
curl -X POST http://localhost:8001/instances \
-H "X-API-Key: your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"config": {
"backend_type": "llamacpp",
"model": "/path/to/model.gguf",
"alias": "llama-3:8b",
"threads": 4,
"n_gpu_layers": 999,
"temp": 0.7,
"top_p": 0.9,
"top_k": 40,
"min_p": 0.05,
"ctx_size": 8192,
"host": "0.0.0.0",
"api_key": "instance-key"
}
}'
Creating a HuggingFace Causal LM Instance
curl -X POST http://localhost:8001/instances \
-H "X-API-Key: your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"config": {
"backend_type": "huggingface_causal",
"model_id": "meta-llama/Llama-2-7b-chat-hf",
"alias": "llama2-hf:7b",
"device": "auto",
"dtype": "auto",
"max_length": 4096,
"trust_remote_code": false,
"use_flash_attention": true,
"host": "0.0.0.0",
"api_key": "instance-key"
}
}'
Creating a HuggingFace Classification Instance
curl -X POST http://localhost:8001/instances \
-H "X-API-Key: your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"config": {
"backend_type": "huggingface_classification",
"model_id": "distilbert-base-uncased-finetuned-sst-2-english",
"alias": "sentiment:distilbert",
"device": "auto",
"dtype": "auto",
"max_length": 512,
"labels": ["negative", "positive"],
"host": "0.0.0.0",
"api_key": "instance-key"
}
}'
Creating a HuggingFace Embedding Instance
curl -X POST http://localhost:8001/instances \
-H "X-API-Key: your-secret-key-here" \
-H "Content-Type: application/json" \
-d '{
"config": {
"backend_type": "huggingface_embedding",
"model_id": "sentence-transformers/all-MiniLM-L6-v2",
"alias": "embed:minilm",
"device": "auto",
"dtype": "auto",
"max_length": 512,
"normalize_embeddings": true,
"host": "0.0.0.0",
"api_key": "instance-key"
}
}'
Starting an Instance
curl -X POST http://localhost:8001/instances/{instance-id}/start \
-H "X-API-Key: your-secret-key-here"
Viewing All Instances
curl http://localhost:8001/instances \
-H "X-API-Key: your-secret-key-here"
Stopping an Instance
curl -X POST http://localhost:8001/instances/{instance-id}/stop \
-H "X-API-Key: your-secret-key-here"
API Endpoints
Instance Management
POST /instances- Create new instanceGET /instances- List all instancesGET /instances/{id}- Get instance detailsPUT /instances/{id}- Update instance configDELETE /instances/{id}- Remove instancePOST /instances/{id}/start- Start instancePOST /instances/{id}/stop- Stop instancePOST /instances/{id}/restart- Restart instanceGET /instances/{id}/state- Get runtime stateGET /instances/{id}/last-generation- Get last generation metrics
WebSocket
WS /instances/{id}/logs- Stream logs with sequence numbersWS /instances/{id}/state- Stream runtime state updates
System
GET /health- Health checkGET /memory- GPU/RAM memory usage
Authentication
All requests require an X-API-Key header with your configured API key from the .env file.
Configuration Reference
llama.cpp Config Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
backend_type |
No | "llamacpp" |
Backend type identifier |
model |
Yes | - | Full path to the GGUF model file |
alias |
Yes | - | Model alias (e.g., "llama-3:8b") used for routing |
threads |
No | 1 | Number of CPU threads to use |
n_gpu_layers |
No | 999 | Number of layers to offload to GPU (999 = all) |
temp |
No | 1.0 | Sampling temperature (0.0-2.0) |
top_p |
No | 1.0 | Top-p sampling (0.0-1.0) |
top_k |
No | 0 | Top-k sampling (0 = disabled) |
min_p |
No | 0.0 | Min-p sampling (0.0-1.0) |
ctx_size |
No | 131072 | Context window size |
chat_template_file |
No | - | Path to Jinja chat template file |
special |
No | false | Enable llama-server --special flag |
ot |
No | - | Override tensor string (passed as -ot flag to llama-server) |
model_type |
No | "llm" |
Model type: "llm", "embedding", or "reranker" |
pooling |
No | - | Pooling strategy for embedding models: "none", "mean", "cls", "last", "rank" (only valid when model_type is "embedding") |
host |
No | "0.0.0.0" | Host to bind to |
port |
No | auto | Port (auto-assigned if not specified) |
api_key |
Yes | - | API key for this instance |
HuggingFace Causal LM Config Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
backend_type |
Yes | - | Must be "huggingface_causal" |
model_id |
Yes | - | HuggingFace model ID or local path |
alias |
Yes | - | Model alias for routing |
device |
No | "auto" |
Device: auto, cuda, mps, cpu |
dtype |
No | "auto" |
Data type: auto, float16, bfloat16, float32 |
max_length |
No | 4096 | Maximum sequence length |
trust_remote_code |
No | false | Trust remote code from HuggingFace |
use_flash_attention |
No | true | Use Flash Attention 2 if available |
host |
No | "0.0.0.0" | Host to bind to |
port |
No | auto | Port (auto-assigned if not specified) |
api_key |
Yes | - | API key for this instance |
HuggingFace Classification Config Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
backend_type |
Yes | - | Must be "huggingface_classification" |
model_id |
Yes | - | HuggingFace model ID or local path |
alias |
Yes | - | Model alias for routing |
device |
No | "auto" |
Device: auto, cuda, mps, cpu |
dtype |
No | "auto" |
Data type: auto, float16, bfloat16, float32 |
max_length |
No | 512 | Maximum sequence length |
labels |
No | auto | Label names (auto-detected from model if not provided) |
trust_remote_code |
No | false | Trust remote code from HuggingFace |
host |
No | "0.0.0.0" | Host to bind to |
port |
No | auto | Port (auto-assigned if not specified) |
api_key |
Yes | - | API key for this instance |
HuggingFace Embedding Config Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
backend_type |
Yes | - | Must be "huggingface_embedding" |
model_id |
Yes | - | HuggingFace model ID or local path |
alias |
Yes | - | Model alias for routing |
device |
No | "auto" |
Device: auto, cuda, mps, cpu |
dtype |
No | "auto" |
Data type: auto, float16, bfloat16, float32 |
max_length |
No | 512 | Maximum sequence length |
normalize_embeddings |
No | true | L2 normalize output embedding vectors |
trust_remote_code |
No | false | Trust remote code from HuggingFace |
host |
No | "0.0.0.0" | Host to bind to |
port |
No | auto | Port (auto-assigned if not specified) |
api_key |
Yes | - | API key for this instance |
Device Options
| Device | Description |
|---|---|
auto |
Automatically select best available (CUDA > MPS > CPU) |
cuda |
NVIDIA GPU (requires CUDA) |
mps |
Apple Silicon GPU (macOS) |
cpu |
CPU only |
Example Configurations
llama.cpp - Small Model
{
"backend_type": "llamacpp",
"model": "/models/llama-3-7b.gguf",
"alias": "llama-3:7b",
"threads": 4,
"n_gpu_layers": 999,
"temp": 0.7,
"top_p": 0.9,
"ctx_size": 8192,
"api_key": "llama3-7b-key"
}
llama.cpp - Large Model with Custom Template
{
"backend_type": "llamacpp",
"model": "/models/gpt-oss-120b-F16.gguf",
"alias": "gpt-oss:120b",
"threads": 1,
"n_gpu_layers": 999,
"ctx_size": 131072,
"chat_template_file": "/models/templates/harmony.jinja",
"api_key": "gpt-oss-key"
}
HuggingFace - Text Generation
{
"backend_type": "huggingface_causal",
"model_id": "microsoft/phi-2",
"alias": "phi-2:2.7b",
"device": "cuda",
"dtype": "float16",
"max_length": 2048,
"api_key": "phi2-key"
}
HuggingFace - Sentiment Classification
{
"backend_type": "huggingface_classification",
"model_id": "cardiffnlp/twitter-roberta-base-sentiment-latest",
"alias": "sentiment:roberta",
"device": "cuda",
"max_length": 512,
"labels": ["negative", "neutral", "positive"],
"api_key": "sentiment-key"
}
HuggingFace - Embedding Model
{
"backend_type": "huggingface_embedding",
"model_id": "sentence-transformers/all-MiniLM-L6-v2",
"alias": "embed:minilm",
"device": "cuda",
"max_length": 512,
"normalize_embeddings": true,
"api_key": "embed-key"
}
File Structure
solar-host/
├── .env # Configuration (not in git)
├── config.json # Auto-generated instance storage (not in git)
├── logs/ # Auto-generated log directory (not in git)
├── pyproject.toml # Package metadata and dependencies
├── solar_host/
│ ├── backends/ # Backend runners
│ │ ├── base.py # Abstract BackendRunner
│ │ ├── llamacpp.py # llama.cpp runner
│ │ └── huggingface.py # HuggingFace runner
│ ├── models/ # Pydantic models
│ │ ├── base.py # Base models
│ │ ├── llamacpp.py # llama.cpp config
│ │ └── huggingface.py # HuggingFace configs
│ ├── servers/ # Standalone server processes
│ │ └── hf_server.py # HuggingFace model server
│ ├── routes/ # API routes
│ ├── config.py # Configuration management
│ ├── main.py # FastAPI application
│ ├── models_manager.py # Managed models directory and manifest
│ └── process_manager.py # Process lifecycle management
├── tests/
└── README.md
Troubleshooting
Solar-host won't start
Error: "Address already in use"
- Another service is using port 8001
- Solution: Change
PORTin.envor stop the other service
Error: "No module named 'solar_host'"
- The package is not installed
- Solution:
pip install solar-hostorpip install -e .for development
llama.cpp Instance fails to start
-
Verify llama-server is installed:
which llama-server -
Check model path:
ls -lh /path/to/your/model.gguf
-
Check instance logs in
logs/directory
HuggingFace Instance fails to start
-
Verify dependencies:
python -c "import torch; import transformers; print('OK')"
-
Check CUDA availability (if using GPU):
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
-
Check MPS availability (macOS):
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}')"
-
Check instance logs in
logs/directory
Instance keeps retrying and failing
- Solar-host will retry starting an instance up to 2 times
- Check the
error_messagefield:curl http://localhost:8001/instances/{instance-id} \ -H "X-API-Key: your-key" | jq '.error_message'
Conda Environment
When running solar-host from a conda environment, HuggingFace server subprocesses automatically inherit the same environment. Just ensure all dependencies are installed in your conda environment:
conda activate your-env
pip install torch transformers accelerate
Integration with Solar Control
Solar-host connects to solar-control over Socket.IO (namespace /hosts). Set SOLAR_CONTROL_URL and SOLAR_CONTROL_API_KEY in .env. On startup the host registers and appears in solar-control’s pending list until approved.
Approve the host (via solar-control management API or WebUI):
# List pending hosts
curl http://your-control-server:8000/api/hosts/pending \
-H "X-API-Key: your-management-api-key"
# Approve (use pending_id from the list)
curl -X POST http://your-control-server:8000/api/hosts/pending/{pending_id}/approve \
-H "X-API-Key: your-management-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "GPU Server 1",
"url": "http://192.168.1.100:8001",
"api_key": "your-solar-host-api-key"
}'
Alternatively, create a host directly (no pending step) with POST /api/hosts and the same JSON body.
Once approved, instances are accessible through solar-control’s OpenAI-compatible gateway:
/v1/chat/completions- Chat completion (llamacpp, huggingface_causal)/v1/completions- Text completion (llamacpp, huggingface_causal)/v1/classify- Classification (huggingface_classification)/v1/embeddings- Embeddings (huggingface_embedding)
Backward Compatibility
Existing configurations without backend_type are automatically treated as llamacpp instances. No migration required.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file solar_host-0.1.4.tar.gz.
File metadata
- Download URL: solar_host-0.1.4.tar.gz
- Upload date:
- Size: 52.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0000eb5a2e10b36ae8c0db61a686ecb2d46a248e0d9bfaddcb6b898fd6590730
|
|
| MD5 |
616b1a802e026d782ed93ebadfba8e2a
|
|
| BLAKE2b-256 |
4d3c965a870b9d3ad9a727bebaa8e39d37f8e8d3a7ad3d513c741100ce8b4ef2
|
Provenance
The following attestation bundles were made for solar_host-0.1.4.tar.gz:
Publisher:
publish.yml on DamitDev/solar-host
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
solar_host-0.1.4.tar.gz -
Subject digest:
0000eb5a2e10b36ae8c0db61a686ecb2d46a248e0d9bfaddcb6b898fd6590730 - Sigstore transparency entry: 1293651865
- Sigstore integration time:
-
Permalink:
DamitDev/solar-host@d2fe7b5d513ac90b523dd44853665d54aa79d82c -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/DamitDev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2fe7b5d513ac90b523dd44853665d54aa79d82c -
Trigger Event:
release
-
Statement type:
File details
Details for the file solar_host-0.1.4-py3-none-any.whl.
File metadata
- Download URL: solar_host-0.1.4-py3-none-any.whl
- Upload date:
- Size: 54.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6eeaeae62c83f45aa9212999a4f6769b9af332b75c873252dddc216fd673b18
|
|
| MD5 |
61a11ac6045e8db2cf6ad9b9ab2e4f22
|
|
| BLAKE2b-256 |
73681f41d19fe507a65fb9351cf788c1a9776aee60b99abec6fcd32d1520e96c
|
Provenance
The following attestation bundles were made for solar_host-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on DamitDev/solar-host
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
solar_host-0.1.4-py3-none-any.whl -
Subject digest:
d6eeaeae62c83f45aa9212999a4f6769b9af332b75c873252dddc216fd673b18 - Sigstore transparency entry: 1293651872
- Sigstore integration time:
-
Permalink:
DamitDev/solar-host@d2fe7b5d513ac90b523dd44853665d54aa79d82c -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/DamitDev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2fe7b5d513ac90b523dd44853665d54aa79d82c -
Trigger Event:
release
-
Statement type: