Process manager for model inference backends (llama.cpp, HuggingFace)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

legekka

These details have not been verified by PyPI

Project description

Solar Host

A multi-backend process manager for model inference servers with REST API and WebSocket log streaming.

Features

Multi-Backend Support:
- llama.cpp (llama-server) for GGUF models
- HuggingFace AutoModelForCausalLM for text generation
- HuggingFace AutoModelForSequenceClassification for classification
- HuggingFace AutoModel for embeddings (last hidden state with mean pooling)
Socket.IO control client - Connects to solar-control’s /hosts namespace for registration, heartbeat, and instance lifecycle (start/stop/restart, config updates). Supports pending-host and rejection events with post-approval sync.
Robust instance lifecycle - Non-blocking process wait, state re-check after startup to avoid start/stop races, and full cleanup of log/state buffers on stop or delete.
Auto-assign ports starting from 3500
Persistent configuration with auto-restart on boot
Real-time log streaming via WebSocket
REST API for instance management
API key authentication

Installation

# Basic install (llama.cpp backend only)
pip install solar-host

# With HuggingFace backend support
pip install solar-host[huggingface]

# With NVIDIA GPU monitoring
pip install solar-host[nvidia]

# Everything
pip install solar-host[all]

# Development (editable install with test dependencies)
pip install -e ".[all,dev]"

Backend-Specific Requirements

For llama.cpp backend:

Install llama-server and ensure it's in your PATH

For HuggingFace backends:

Install with the huggingface extra: pip install solar-host[huggingface]

Setup

1. Create .env file

Create a .env file in the solar-host/ directory:

API_KEY=your-secret-key-here
HOST=0.0.0.0
PORT=8001
MODELS_DIR=./models

# Solar-control connection (for Socket.IO registration and lifecycle)
SOLAR_CONTROL_URL=http://localhost:8000
SOLAR_CONTROL_API_KEY=your-solar-control-management-api-key

API_KEY - Used by solar-control (and other callers) to access this host’s REST API.
MODELS_DIR - Path to the models directory. Used for disk space reporting in the /health endpoint. Defaults to ./models.
SOLAR_CONTROL_URL - Base URL of solar-control (HTTP; Socket.IO connects to the same origin).
SOLAR_CONTROL_API_KEY - Management API key from solar-control. The host uses it to connect to the /hosts namespace; it must be approved via the management API or WebUI before it appears in the gateway pool.

2. Start the server

# Start the server (reads HOST and PORT from .env)
solar-host

# Or with uvicorn directly (e.g. for --reload during development)
uvicorn solar_host.main:app --host 0.0.0.0 --port 8001 --reload

The server will:

Create config.json automatically (if it doesn't exist)
Create logs/ directory for instance logs
Auto-restart any instances that were running before shutdown

3. Verify it's running

curl http://localhost:8001/health
# Should return: {"status":"healthy","service":"solar-host","version":"2.0.0","disk":{"total_gb":500,"used_gb":120,"available_gb":380}}

4. Access Swagger UI

Open your browser to: http://localhost:8001/docs

Click the "Authorize" button
Enter your API key from .env file
Click "Authorize" and then "Close"
Now you can use the interactive API documentation!

Backend Types

Solar Host supports four backend types:

Backend Type	Model Type	Endpoints Supported
`llamacpp`	GGUF models via llama-server	`/v1/chat/completions`, `/v1/completions`
`huggingface_causal`	HuggingFace AutoModelForCausalLM	`/v1/chat/completions`, `/v1/completions`
`huggingface_classification`	HuggingFace AutoModelForSequenceClassification	`/v1/classify`
`huggingface_embedding`	HuggingFace AutoModel (last hidden state)	`/v1/embeddings`

Managing Instances

Creating a llama.cpp Instance

curl -X POST http://localhost:8001/instances \
  -H "X-API-Key: your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "backend_type": "llamacpp",
      "model": "/path/to/model.gguf",
      "alias": "llama-3:8b",
      "threads": 4,
      "n_gpu_layers": 999,
      "temp": 0.7,
      "top_p": 0.9,
      "top_k": 40,
      "min_p": 0.05,
      "ctx_size": 8192,
      "host": "0.0.0.0",
      "api_key": "instance-key"
    }
  }'

Creating a HuggingFace Causal LM Instance

curl -X POST http://localhost:8001/instances \
  -H "X-API-Key: your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "backend_type": "huggingface_causal",
      "model_id": "meta-llama/Llama-2-7b-chat-hf",
      "alias": "llama2-hf:7b",
      "device": "auto",
      "dtype": "auto",
      "max_length": 4096,
      "trust_remote_code": false,
      "use_flash_attention": true,
      "host": "0.0.0.0",
      "api_key": "instance-key"
    }
  }'

Creating a HuggingFace Classification Instance

curl -X POST http://localhost:8001/instances \
  -H "X-API-Key: your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "backend_type": "huggingface_classification",
      "model_id": "distilbert-base-uncased-finetuned-sst-2-english",
      "alias": "sentiment:distilbert",
      "device": "auto",
      "dtype": "auto",
      "max_length": 512,
      "labels": ["negative", "positive"],
      "host": "0.0.0.0",
      "api_key": "instance-key"
    }
  }'

Creating a HuggingFace Embedding Instance

curl -X POST http://localhost:8001/instances \
  -H "X-API-Key: your-secret-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "backend_type": "huggingface_embedding",
      "model_id": "sentence-transformers/all-MiniLM-L6-v2",
      "alias": "embed:minilm",
      "device": "auto",
      "dtype": "auto",
      "max_length": 512,
      "normalize_embeddings": true,
      "host": "0.0.0.0",
      "api_key": "instance-key"
    }
  }'

Starting an Instance

curl -X POST http://localhost:8001/instances/{instance-id}/start \
  -H "X-API-Key: your-secret-key-here"

Viewing All Instances

curl http://localhost:8001/instances \
  -H "X-API-Key: your-secret-key-here"

Stopping an Instance

curl -X POST http://localhost:8001/instances/{instance-id}/stop \
  -H "X-API-Key: your-secret-key-here"

API Endpoints

Instance Management

POST /instances - Create new instance
GET /instances - List all instances
GET /instances/{id} - Get instance details
PUT /instances/{id} - Update instance config
DELETE /instances/{id} - Remove instance
POST /instances/{id}/start - Start instance
POST /instances/{id}/stop - Stop instance
POST /instances/{id}/restart - Restart instance
GET /instances/{id}/state - Get runtime state
GET /instances/{id}/last-generation - Get last generation metrics

WebSocket

WS /instances/{id}/logs - Stream logs with sequence numbers
WS /instances/{id}/state - Stream runtime state updates

System

GET /health - Health check
GET /memory - GPU/RAM memory usage

Authentication

All requests require an X-API-Key header with your configured API key from the .env file.

Configuration Reference

llama.cpp Config Parameters

Parameter	Required	Default	Description
`backend_type`	No	`"llamacpp"`	Backend type identifier
`model`	Yes	-	Full path to the GGUF model file
`alias`	Yes	-	Model alias (e.g., "llama-3:8b") used for routing
`threads`	No	1	Number of CPU threads to use
`n_gpu_layers`	No	999	Number of layers to offload to GPU (999 = all)
`temp`	No	1.0	Sampling temperature (0.0-2.0)
`top_p`	No	1.0	Top-p sampling (0.0-1.0)
`top_k`	No	0	Top-k sampling (0 = disabled)
`min_p`	No	0.0	Min-p sampling (0.0-1.0)
`ctx_size`	No	131072	Context window size
`chat_template_file`	No	-	Path to Jinja chat template file
`special`	No	false	Enable llama-server `--special` flag
`ot`	No	-	Override tensor string (passed as `-ot` flag to llama-server)
`model_type`	No	`"llm"`	Model type: `"llm"`, `"embedding"`, or `"reranker"`
`pooling`	No	-	Pooling strategy for embedding models: `"none"`, `"mean"`, `"cls"`, `"last"`, `"rank"` (only valid when `model_type` is `"embedding"`)
`host`	No	"0.0.0.0"	Host to bind to
`port`	No	auto	Port (auto-assigned if not specified)
`api_key`	Yes	-	API key for this instance

HuggingFace Causal LM Config Parameters

Parameter	Required	Default	Description
`backend_type`	Yes	-	Must be `"huggingface_causal"`
`model_id`	Yes	-	HuggingFace model ID or local path
`alias`	Yes	-	Model alias for routing
`device`	No	`"auto"`	Device: `auto`, `cuda`, `mps`, `cpu`
`dtype`	No	`"auto"`	Data type: `auto`, `float16`, `bfloat16`, `float32`
`max_length`	No	4096	Maximum sequence length
`trust_remote_code`	No	false	Trust remote code from HuggingFace
`use_flash_attention`	No	true	Use Flash Attention 2 if available
`host`	No	"0.0.0.0"	Host to bind to
`port`	No	auto	Port (auto-assigned if not specified)
`api_key`	Yes	-	API key for this instance

HuggingFace Classification Config Parameters

Parameter	Required	Default	Description
`backend_type`	Yes	-	Must be `"huggingface_classification"`
`model_id`	Yes	-	HuggingFace model ID or local path
`alias`	Yes	-	Model alias for routing
`device`	No	`"auto"`	Device: `auto`, `cuda`, `mps`, `cpu`
`dtype`	No	`"auto"`	Data type: `auto`, `float16`, `bfloat16`, `float32`
`max_length`	No	512	Maximum sequence length
`labels`	No	auto	Label names (auto-detected from model if not provided)
`trust_remote_code`	No	false	Trust remote code from HuggingFace
`host`	No	"0.0.0.0"	Host to bind to
`port`	No	auto	Port (auto-assigned if not specified)
`api_key`	Yes	-	API key for this instance

HuggingFace Embedding Config Parameters

Parameter	Required	Default	Description
`backend_type`	Yes	-	Must be `"huggingface_embedding"`
`model_id`	Yes	-	HuggingFace model ID or local path
`alias`	Yes	-	Model alias for routing
`device`	No	`"auto"`	Device: `auto`, `cuda`, `mps`, `cpu`
`dtype`	No	`"auto"`	Data type: `auto`, `float16`, `bfloat16`, `float32`
`max_length`	No	512	Maximum sequence length
`normalize_embeddings`	No	true	L2 normalize output embedding vectors
`trust_remote_code`	No	false	Trust remote code from HuggingFace
`host`	No	"0.0.0.0"	Host to bind to
`port`	No	auto	Port (auto-assigned if not specified)
`api_key`	Yes	-	API key for this instance

Device Options

Device	Description
`auto`	Automatically select best available (CUDA > MPS > CPU)
`cuda`	NVIDIA GPU (requires CUDA)
`mps`	Apple Silicon GPU (macOS)
`cpu`	CPU only

Example Configurations

llama.cpp - Small Model

{
  "backend_type": "llamacpp",
  "model": "/models/llama-3-7b.gguf",
  "alias": "llama-3:7b",
  "threads": 4,
  "n_gpu_layers": 999,
  "temp": 0.7,
  "top_p": 0.9,
  "ctx_size": 8192,
  "api_key": "llama3-7b-key"
}

llama.cpp - Large Model with Custom Template

{
  "backend_type": "llamacpp",
  "model": "/models/gpt-oss-120b-F16.gguf",
  "alias": "gpt-oss:120b",
  "threads": 1,
  "n_gpu_layers": 999,
  "ctx_size": 131072,
  "chat_template_file": "/models/templates/harmony.jinja",
  "api_key": "gpt-oss-key"
}

HuggingFace - Text Generation

{
  "backend_type": "huggingface_causal",
  "model_id": "microsoft/phi-2",
  "alias": "phi-2:2.7b",
  "device": "cuda",
  "dtype": "float16",
  "max_length": 2048,
  "api_key": "phi2-key"
}

HuggingFace - Sentiment Classification

{
  "backend_type": "huggingface_classification",
  "model_id": "cardiffnlp/twitter-roberta-base-sentiment-latest",
  "alias": "sentiment:roberta",
  "device": "cuda",
  "max_length": 512,
  "labels": ["negative", "neutral", "positive"],
  "api_key": "sentiment-key"
}

HuggingFace - Embedding Model

{
  "backend_type": "huggingface_embedding",
  "model_id": "sentence-transformers/all-MiniLM-L6-v2",
  "alias": "embed:minilm",
  "device": "cuda",
  "max_length": 512,
  "normalize_embeddings": true,
  "api_key": "embed-key"
}

File Structure

solar-host/
├── .env                    # Configuration (not in git)
├── config.json             # Auto-generated instance storage (not in git)
├── logs/                   # Auto-generated log directory (not in git)
├── pyproject.toml          # Package metadata and dependencies
├── solar_host/
│   ├── backends/           # Backend runners
│   │   ├── base.py         # Abstract BackendRunner
│   │   ├── llamacpp.py     # llama.cpp runner
│   │   └── huggingface.py  # HuggingFace runner
│   ├── models/             # Pydantic models
│   │   ├── base.py         # Base models
│   │   ├── llamacpp.py     # llama.cpp config
│   │   └── huggingface.py  # HuggingFace configs
│   ├── servers/            # Standalone server processes
│   │   └── hf_server.py    # HuggingFace model server
│   ├── routes/             # API routes
│   ├── config.py           # Configuration management
│   ├── main.py             # FastAPI application
│   ├── models_manager.py   # Managed models directory and manifest
│   └── process_manager.py  # Process lifecycle management
├── tests/
└── README.md

Troubleshooting

Solar-host won't start

Error: "Address already in use"

Another service is using port 8001
Solution: Change PORT in .env or stop the other service

Error: "No module named 'solar_host'"

The package is not installed
Solution: pip install solar-host or pip install -e . for development

llama.cpp Instance fails to start

Verify llama-server is installed:
```
which llama-server
```
Check model path:
```
ls -lh /path/to/your/model.gguf
```
Check instance logs in logs/ directory

HuggingFace Instance fails to start

Verify dependencies:

python -c "import torch; import transformers; print('OK')"

Check CUDA availability (if using GPU):

python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

Check MPS availability (macOS):

python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}')"

Check instance logs in logs/ directory

Instance keeps retrying and failing

Solar-host will retry starting an instance up to 2 times

Check the error_message field:

curl http://localhost:8001/instances/{instance-id} \
  -H "X-API-Key: your-key" | jq '.error_message'

Conda Environment

When running solar-host from a conda environment, HuggingFace server subprocesses automatically inherit the same environment. Just ensure all dependencies are installed in your conda environment:

conda activate your-env
pip install torch transformers accelerate

Integration with Solar Control

Solar-host connects to solar-control over Socket.IO (namespace /hosts). Set SOLAR_CONTROL_URL and SOLAR_CONTROL_API_KEY in .env. On startup the host registers and appears in solar-control’s pending list until approved.

Approve the host (via solar-control management API or WebUI):

# List pending hosts
curl http://your-control-server:8000/api/hosts/pending \
  -H "X-API-Key: your-management-api-key"

# Approve (use pending_id from the list)
curl -X POST http://your-control-server:8000/api/hosts/pending/{pending_id}/approve \
  -H "X-API-Key: your-management-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPU Server 1",
    "url": "http://192.168.1.100:8001",
    "api_key": "your-solar-host-api-key"
  }'

Alternatively, create a host directly (no pending step) with POST /api/hosts and the same JSON body.

Once approved, instances are accessible through solar-control’s OpenAI-compatible gateway:

/v1/chat/completions - Chat completion (llamacpp, huggingface_causal)
/v1/completions - Text completion (llamacpp, huggingface_causal)
/v1/classify - Classification (huggingface_classification)
/v1/embeddings - Embeddings (huggingface_embedding)

Backward Compatibility

Existing configurations without backend_type are automatically treated as llamacpp instances. No migration required.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

legekka

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.8

Apr 29, 2026

0.1.7

Apr 28, 2026

0.1.6

Apr 16, 2026

0.1.5

Apr 14, 2026

This version

0.1.4

Apr 14, 2026

0.1.3

Apr 2, 2026

0.1.2

Mar 31, 2026

0.1.1

Mar 31, 2026

0.1.0

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solar_host-0.1.4.tar.gz (52.8 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

solar_host-0.1.4-py3-none-any.whl (54.8 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file solar_host-0.1.4.tar.gz.

File metadata

Download URL: solar_host-0.1.4.tar.gz
Upload date: Apr 14, 2026
Size: 52.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for solar_host-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`0000eb5a2e10b36ae8c0db61a686ecb2d46a248e0d9bfaddcb6b898fd6590730`
MD5	`616b1a802e026d782ed93ebadfba8e2a`
BLAKE2b-256	`4d3c965a870b9d3ad9a727bebaa8e39d37f8e8d3a7ad3d513c741100ce8b4ef2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for solar_host-0.1.4.tar.gz:

Publisher: publish.yml on DamitDev/solar-host

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: solar_host-0.1.4.tar.gz
- Subject digest: 0000eb5a2e10b36ae8c0db61a686ecb2d46a248e0d9bfaddcb6b898fd6590730
- Sigstore transparency entry: 1293651865
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: DamitDev/solar-host@d2fe7b5d513ac90b523dd44853665d54aa79d82c
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/DamitDev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d2fe7b5d513ac90b523dd44853665d54aa79d82c
- Trigger Event: release

File details

Details for the file solar_host-0.1.4-py3-none-any.whl.

File metadata

Download URL: solar_host-0.1.4-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 54.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for solar_host-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6eeaeae62c83f45aa9212999a4f6769b9af332b75c873252dddc216fd673b18`
MD5	`61a11ac6045e8db2cf6ad9b9ab2e4f22`
BLAKE2b-256	`73681f41d19fe507a65fb9351cf788c1a9776aee60b99abec6fcd32d1520e96c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for solar_host-0.1.4-py3-none-any.whl:

Publisher: publish.yml on DamitDev/solar-host

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: solar_host-0.1.4-py3-none-any.whl
- Subject digest: d6eeaeae62c83f45aa9212999a4f6769b9af332b75c873252dddc216fd673b18
- Sigstore transparency entry: 1293651872
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: DamitDev/solar-host@d2fe7b5d513ac90b523dd44853665d54aa79d82c
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/DamitDev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d2fe7b5d513ac90b523dd44853665d54aa79d82c
- Trigger Event: release

solar-host 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Solar Host

Features

Installation

Backend-Specific Requirements

Setup

1. Create .env file

2. Start the server

3. Verify it's running

4. Access Swagger UI

Backend Types

Managing Instances

Creating a llama.cpp Instance

Creating a HuggingFace Causal LM Instance

Creating a HuggingFace Classification Instance

Creating a HuggingFace Embedding Instance

Starting an Instance

Viewing All Instances

Stopping an Instance

API Endpoints

Instance Management

WebSocket

System

Authentication

Configuration Reference

llama.cpp Config Parameters

HuggingFace Causal LM Config Parameters

HuggingFace Classification Config Parameters

HuggingFace Embedding Config Parameters

Device Options

Example Configurations

llama.cpp - Small Model

llama.cpp - Large Model with Custom Template

HuggingFace - Text Generation

HuggingFace - Sentiment Classification

HuggingFace - Embedding Model

File Structure

Troubleshooting

Solar-host won't start

llama.cpp Instance fails to start

HuggingFace Instance fails to start

Instance keeps retrying and failing

Conda Environment

Integration with Solar Control

Backward Compatibility

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance