Simple client to interact with regolo.ai

These details have not been verified by PyPI

Project links

Homepage

Project description

Regolo.ai Python Client

A comprehensive Python client for interacting with Regolo.ai's LLM-based API and Model Management platform.

Installation
Basic Usage
CLI Usage
Model Management & Deployment
Advanced Usage
Environment Variables

Installation

Install the regolo package using pip:

pip install regolo

Basic Usage

1. Import the regolo module

import regolo

2. Set Up Default API Key and Model

To avoid manually passing the API key and model in every request, you can set them globally:

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_chat_model = "Llama-3.3-70B-Instruct"

This ensures that all RegoloClient instances and static functions will use the specified API key and model.

You can still override these defaults by passing parameters directly to methods.

Chat and Completions

Text Completion:

# Using static method
response = regolo.static_completions(prompt="Tell me something about Rome.")
print(response)

# Using client instance
client = regolo.RegoloClient()
response = client.completions(prompt="Tell me something about Rome.")
print(response)

Chat Completion:

# Using static method
role, content = regolo.static_chat_completions(
    messages=[{"role": "user", "content": "Tell me something about Rome"}]
)
print(f"{role}: {content}")

# Using client instance
client = regolo.RegoloClient()
role, content = client.run_chat(user_prompt="Tell me something about Rome")
print(f"{role}: {content}")

Handling Conversation History:

client = regolo.RegoloClient()

# Add prompts to conversation
client.add_prompt_to_chat(role="user", prompt="Tell me about Rome!")
print(client.run_chat())

# Continue the conversation
client.add_prompt_to_chat(role="user", prompt="Tell me more about its history!")
print(client.run_chat())

# View full conversation
print(client.instance.get_conversation())

# Clear conversation to start fresh
client.clear_conversations()

Streaming Responses:

With full output:

client = regolo.RegoloClient()
response = client.run_chat(
    user_prompt="Tell me about Rome",
    stream=True,
    full_output=True
)

while True:
    try:
        print(next(response))
    except StopIteration:
        break

Without full output (text only):

client = regolo.RegoloClient()
response = client.run_chat(
    user_prompt="Tell me about Rome",
    stream=True,
    full_output=False
)

while True:
    try:
        role, content = next(response)
        if role:
            print(f"{role}:")
        print(content, end="", flush=True)
    except StopIteration:
        break

Image Generation

Without client:

from io import BytesIO
from PIL import Image
import regolo

regolo.default_image_generation_model = "Qwen-Image"
regolo.default_key = "<YOUR_API_KEY>"

img_bytes = regolo.static_image_create(prompt="a cat")[0]
image = Image.open(BytesIO(img_bytes))
image.show()

With client:

from io import BytesIO
from PIL import Image
import regolo

client = regolo.RegoloClient(
    image_generation_model="Qwen-Image",
    api_key="<YOUR_API_KEY>"
)

img_bytes = client.create_image(prompt="A cat in Rome")[0]
image = Image.open(BytesIO(img_bytes))
image.show()

Generate multiple images:

client = regolo.RegoloClient()
images = client.create_image(
    prompt="Beautiful landscape",
    n=3,  # Generate 3 images
    quality="hd",
    size="1024x1024",
    style="realistic"
)

for i, img_bytes in enumerate(images):
    image = Image.open(BytesIO(img_bytes))
    image.save(f"output_{i}.png")

Audio Transcription

Without client:

import regolo

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_audio_transcription_model = "faster-whisper-large-v3"

transcribed_text = regolo.static_audio_transcription(
    file="path/to/audio.mp3",
    full_output=True
)
print(transcribed_text)

With client:

import regolo

client = regolo.RegoloClient(
    api_key="<YOUR_API_KEY>",
    audio_transcription_model="faster-whisper-large-v3"
)

transcribed_text = client.audio_transcription(
    file="path/to/audio.mp3",
    language="en",  # Optional: specify language
    response_format="json"  # Options: json, text, srt, verbose_json, vtt
)
print(transcribed_text)

Streaming transcription:

client = regolo.RegoloClient()
response = client.audio_transcription(
    file="path/to/audio.mp3",
    stream=True
)

for chunk in response:
    print(chunk, end="", flush=True)

Text Embeddings

Without client:

import regolo

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_embedder_model = "gte-Qwen2"

embeddings = regolo.static_embeddings(input_text=["test", "test1"])
print(embeddings)

With client:

import regolo

client = regolo.RegoloClient(
    api_key="<YOUR_API_KEY>",
    embedder_model="gte-Qwen2"
)

# Single text
embedding = client.embeddings(input_text="Hello world")
print(embedding)

# Multiple texts
embeddings = client.embeddings(input_text=["text1", "text2", "text3"])
for i, emb in enumerate(embeddings):
    print(f"Embedding {i}: {emb['embedding'][:5]}...")  # First 5 dimensions

Document Reranking

Without client:

import regolo

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_reranker_model = "jina-reranker-v2"

documents = [
    "Paris is the capital of France",
    "Berlin is the capital of Germany",
    "Rome is the capital of Italy"
]

results = regolo.RegoloClient.static_rerank(
    query="What is the capital of France?",
    documents=documents,
    api_key=regolo.default_key,
    model=regolo.default_reranker_model,
    top_n=2
)

for result in results:
    print(f"Document {result['index']}: {result['relevance_score']:.4f}")
    if 'document' in result:
        print(f"  Content: {result['document']}")

With client:

import regolo

client = regolo.RegoloClient(
    api_key="<YOUR_API_KEY>",
    reranker_model="jina-reranker-v2"
)

documents = [
    {"title": "Doc1", "text": "Paris is the capital of France"},
    {"title": "Doc2", "text": "Berlin is the capital of Germany"}
]

results = client.rerank(
    query="French capital",
    documents=documents,
    top_n=1,
    rank_fields=["text"],  # For structured documents
    return_documents=True
)

print(f"Most relevant: {results[0]['document']}")

CLI Usage

The Regolo CLI provides a comprehensive interface for model management, inference deployment, and API interactions.

Chat Interface

Start an interactive chat session:

regolo chat

Options:

--no-hide: Display API key while typing
--disable-newlines: Replace newlines with spaces in responses
--api-key <key>: Provide API key directly instead of being prompted

Example:

regolo chat --api-key <YOUR_API_KEY> --disable-newlines

Model Management

Authentication

Before using model management features, authenticate with your credentials:

regolo auth login

You'll be prompted for your username and password. The CLI will save your authentication tokens automatically.

Logout:

regolo auth logout

List Available Models

Get models accessible with your API key:

regolo get-available-models --api-key <YOUR_API_KEY>

Filter by model type:

regolo get-available-models --api-key <YOUR_API_KEY> --model-type chat
# Options: chat, image_generation, embedding, audio_transcription, rerank

Register Models

Register a HuggingFace model:

regolo models register \
  --name my-llama-model \
  --type huggingface \
  --url meta-llama/Llama-2-7b-hf \
  --api-key <HF_TOKEN>  # Optional, for private models

Register a custom model:

regolo models register \
  --name my-custom-model \
  --type custom

This creates a GitLab repository at git@gitlab.regolo.ai:<username>/my-custom-model.git

List Registered Models

regolo models list

JSON output:

regolo models list --format json

Get Model Details

regolo models details my-llama-model

Delete a Model

regolo models delete my-llama-model --confirm

Inference Management

View Available GPUs

regolo inference gpus

JSON output:

regolo inference gpus --format json

Load Model for Inference

Interactive (will prompt for GPU selection):

regolo inference load my-llama-model

With specific GPU:

regolo inference load my-llama-model --gpu required-gpu

With vLLM configuration:

regolo inference load my-llama-model \
  --gpu ECS1GPU11 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.9 \
  --tensor-parallel-size 1

Using vLLM config file:

# Create vllm_config.json
cat > vllm_config.json << EOF
{
  "max_model_len": 4096,
  "gpu_memory_utilization": 0.9,
  "tensor_parallel_size": 1,
  "disable_log_requests": true
}
EOF

regolo inference load my-llama-model \
  --gpu ECS1GPU11 \
  --vllm-config-file vllm_config.json

Force overwrite existing configuration:

regolo inference load my-llama-model --gpu ECS1GPU11 --force

View Loaded Models

regolo inference status

This shows:

Session IDs
Model names
GPU assignments
Load times
Current costs

Unload Model

Interactive (will show loaded models):

regolo inference unload

By session ID:

regolo inference unload --session-id 12345

By model name:

regolo inference unload --model-name my-llama-model

Monitor Costs

Current month:

regolo inference user-status

Specific month (MMYYYY format):

regolo inference user-status --month 012025

Time range:

regolo inference user-status \
  --time-range-start 2025-01-01T00:00:00Z \
  --time-range-end 2025-01-15T23:59:59Z

JSON output:

regolo inference user-status --format json

SSH Key Management

SSH keys are required to push custom model files to GitLab repositories.

Add SSH Key

From file:

regolo ssh add \
  --title "My Development Key" \
  --key-file ~/.ssh/id_rsa.pub

Direct key content:

regolo ssh add \
  --title "My Development Key" \
  --key "ssh-rsa AAAAB3NzaC1yc2E... user@example.com"

List SSH Keys

regolo ssh list

JSON output:

regolo ssh list --format json

Delete SSH Key

regolo ssh delete <KEY_ID> --confirm

Complete Workflow Command

The workflow command automates the entire model deployment process:

regolo workflow workflow my-custom-model \
  --type custom \
  --ssh-key-file ~/.ssh/id_rsa.pub \
  --ssh-key-title "Dev Key" \
  --local-model-path ./my_model_files \
  --auto-load

This will:

Register the model
Add your SSH key
Guide you through uploading files to GitLab
Automatically load the model for inference (if --auto-load)

For HuggingFace models:

regolo workflow workflow my-gpt2 \
  --type huggingface \
  --url gpt2 \
  --auto-load

Other CLI Commands

Create Images

regolo create-image \
  --api-key <YOUR_API_KEY> \
  --model Qwen-Image \
  --prompt "A beautiful sunset" \
  --n 2 \
  --size 1024x1024 \
  --quality hd \
  --style realistic \
  --save-path ./images \
  --output-file-format png

Transcribe Audio

regolo transcribe-audio \
  --api-key <YOUR_API_KEY> \
  --model faster-whisper-large-v3 \
  --file-path audio.mp3 \
  --language en \
  --response-format json \
  --save-path transcription.txt

Streaming transcription:

regolo transcribe-audio \
  --api-key <YOUR_API_KEY> \
  --model faster-whisper-large-v3 \
  --file-path audio.mp3 \
  --stream

Rerank Documents

regolo rerank \
  --api-key <YOUR_API_KEY> \
  --model jina-reranker-v2 \
  --query "capital of France" \
  --documents "Paris is the capital" \
  --documents "Berlin is the capital" \
  --documents "Rome is the capital" \
  --top-n 2

Using a documents file:

# Create documents.json
cat > documents.json << EOF
[
  "Paris is the capital of France",
  "Berlin is the capital of Germany",
  "Rome is the capital of Italy"
]
EOF

regolo rerank \
  --api-key <YOUR_API_KEY> \
  --model jina-reranker-v2 \
  --query "capital of France" \
  --documents-file documents.json \
  --format table

Model Management & Deployment

The Regolo platform provides comprehensive model management capabilities, allowing you to register, deploy, and monitor both HuggingFace and custom models on GPU infrastructure.

Authentication

Before using model management features, authenticate using the CLI:

regolo auth login

Or in Python:

from regolo.cli import ModelManagementClient

client = ModelManagementClient(base_url="https://devmid.regolo.ai")
auth_response = client.authenticate("username", "password")
print(f"Token expires in {auth_response['expires_in']} seconds")

# Save tokens for future use
access_token = auth_response['access_token']
refresh_token = auth_response['refresh_token']

Registering Models

HuggingFace Models

regolo models register \
  --name my-bert-model \
  --type huggingface \
  --url bert-base-uncased

For private models, include your HuggingFace token:

regolo models register \
  --name my-private-model \
  --type huggingface \
  --url organization/private-model \
  --api-key hf_xxxxxxxxxxxxx

Supported URL formats:

Full URL: https://huggingface.co/bert-base-uncased
Short format: bert-base-uncased
Organization format: BAAI/bge-small-en-v1.5

Custom Models

For custom models, the platform creates a GitLab repository where you can push your model files:

# 1. Register the model
regolo models register \
  --name my-custom-model \
  --type custom

# 2. Add SSH key for repository access
regolo ssh add \
  --title "Development Key" \
  --key-file ~/.ssh/id_rsa.pub

# 3. Clone the repository
git clone git@gitlab.regolo.ai:<username>/my-custom-model.git
cd my-custom-model

# 4. Add your model files
# Directory structure example:
# my-custom-model/
# ├── config.json
# ├── tokenizer.json
# ├── tokenizer_config.json
# ├── special_tokens_map.json
# ├── pytorch_model.bin (or model.safetensors)
# └── vocab.txt

cp -r /path/to/your/model/* .

# 5. Commit and push
git add .
git commit -m "Add model files"
git push origin main

Using Git LFS for large files:

# Initialize Git LFS
git lfs install
git lfs track "*.bin"
git lfs track "*.safetensors"
git add .gitattributes
git commit -m "Configure Git LFS"
git push origin main

In Python:

from regolo.cli import ModelManagementClient

client = ModelManagementClient()
client.authenticate("username", "password")

# Register HuggingFace model
hf_result = client.register_model(
    name="my-gpt2",
    is_huggingface=True,
    url="gpt2"
)

# Register custom model
custom_result = client.register_model(
    name="my-custom-llm",
    is_huggingface=False
)

# Add SSH key
ssh_result = client.add_ssh_key(
    title="Dev Key",
    key="ssh-rsa AAAAB3NzaC1yc2E... user@example.com"
)

Loading Models for Inference

Once registered, load models onto GPU infrastructure for inference:

# View available GPUs
regolo inference gpus

# Load model with specific configuration
regolo inference load my-bert-model \
  --gpu ECS1GPU11 \
  --max-model-len 2048 \
  --gpu-memory-utilization 0.9 \
  --tensor-parallel-size 1

vLLM Configuration Options:

--max-model-len: Maximum sequence length
--gpu-memory-utilization: GPU memory fraction (0.0-1.0)
--tensor-parallel-size: Number of GPUs for tensor parallelism
--disable-log-requests: Disable request logging
--enable-auto-tool-choice: Enable automatic tool choice
--tool-call-parser: Tool call parser (e.g., llama3_json)
--chat-template: Path to chat template file

In Python:

from regolo.cli import ModelManagementClient

client = ModelManagementClient()
client.authenticate("username", "password")

# Get available GPUs
gpus = client.get_available_gpus()
gpu_instance = gpus['gpus'][0]['InstanceType']

# Load model with vLLM configuration
vllm_config = {
    "max_model_len": 4096,
    "gpu_memory_utilization": 0.9,
    "tensor_parallel_size": 1
}

result = client.load_model_for_inference(
    model_name="my-bert-model",
    gpu=gpu_instance,
    vllm_config=vllm_config
)

Monitoring and Billing

View Loaded Models

regolo inference status

Output includes:

Session ID (required for unloading)
Model name
GPU assignment
Load time
Current cost

Monitor Costs

# Current month
regolo inference user-status

# Specific month (MMYYYY format)
regolo inference user-status --month 012025

# Custom time range
regolo inference user-status \
  --time-range-start 2025-01-01T00:00:00Z \
  --time-range-end 2025-01-15T23:59:59Z

Billing Details:

Hourly billing, rounded up to next full hour
Minimum charge: 1 hour
Cost = duration_hours × hourly_price (in EUR)

In Python:

# Get loaded models
loaded = client.get_loaded_models()
for model in loaded['loaded_models']:
    print(f"Model: {model['model_name']}")
    print(f"Session: {model['session_id']}")
    print(f"Cost: €{model['cost']:.2f}")

# Get cost status
status = client.get_user_inference_status()
print(f"Total sessions: {status['total']}")
print(f"Total cost: €{status.get('total_cost', 0):.2f}")

# Generate monthly report
status = client.get_user_inference_status(month="012025")
for inference in status['inferences']:
    print(f"{inference['model_name']}: "
          f"{inference['duration_hours']}h, "
          f"€{inference['cost_euro']:.2f}")

Unload Models

Stop billing by unloading models when not in use:

# Interactive (shows loaded models)
regolo inference unload

# By session ID
regolo inference unload --session-id 12345

# By model name
regolo inference unload --model-name my-bert-model

In Python:

# Get loaded models
loaded = client.get_loaded_models()

# Unload specific model
for model in loaded['loaded_models']:
    if model['model_name'] == 'my-bert-model':
        client.unload_model_from_inference(model['session_id'])
        break

# Unload all models
for model in loaded['loaded_models']:
    client.unload_model_from_inference(model['session_id'])

Complete Workflow Example

# 1. Authenticate
regolo auth login

# 2. Register HuggingFace model
regolo models register \
  --name llama-2-7b \
  --type huggingface \
  --url meta-llama/Llama-2-7b-hf

# 3. View available GPUs
regolo inference gpus

# 4. Load model for inference
regolo inference load llama-2-7b \
  --gpu ECS1GPU11 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.9

# 5. Monitor status (wait for loading to complete)
regolo inference status

# 6. Use the model via API
# (Model is now available through Regolo inference endpoints)

# 7. Check costs
regolo inference user-status

# 8. Unload when done
regolo inference unload --model-name llama-2-7b

# 9. Logout
regolo auth logout

Python equivalent:

from regolo.cli import ModelManagementClient
import time

# Initialize and authenticate
client = ModelManagementClient()
client.authenticate("username", "password")

# Register model
client.register_model(
    name="llama-2-7b",
    is_huggingface=True,
    url="meta-llama/Llama-2-7b-hf"
)

# Get GPU and load model
gpus = client.get_available_gpus()
gpu = gpus['gpus'][0]['InstanceType']

vllm_config = {
    "max_model_len": 4096,
    "gpu_memory_utilization": 0.9
}

client.load_model_for_inference(
    model_name="llama-2-7b",
    gpu=gpu,
    vllm_config=vllm_config
)

# Wait for model to load
print("Waiting for model to load...")
time.sleep(60)  # Adjust based on model size

# Check status
loaded = client.get_loaded_models()
print(f"Loaded models: {loaded['total']}")

# Use model through API...

# Monitor costs
status = client.get_user_inference_status()
print(f"Current cost: €{status.get('total_cost', 0):.2f}")

# Unload when done
for model in loaded['loaded_models']:
    if model['model_name'] == 'llama-2-7b':
        client.unload_model_from_inference(model['session_id'])

Advanced Usage

Switching Models

client = regolo.RegoloClient(chat_model="Llama-3.3-70B-Instruct")

# Use the model
response = client.run_chat(user_prompt="Hello")

# Switch to a different model
client.change_model("gpt-4o")

# Now using the new model
response = client.run_chat(user_prompt="Hello again")

Managing Conversation State

from regolo.instance.structures.conversation_model import Conversation, ConversationLine

client = regolo.RegoloClient()

# Build conversation manually
conversation = Conversation(lines=[
    ConversationLine(role="user", content="What is Python?"),
    ConversationLine(role="assistant", content="Python is a programming language."),
    ConversationLine(role="user", content="Tell me more.")
])

# Use existing conversation
client.instance.overwrite_conversation(conversation)
response = client.run_chat()

# Or create a new client from an existing instance
new_client = regolo.RegoloClient.from_instance(client.instance)

Custom Base URL

# Use a custom Regolo server
client = regolo.RegoloClient(
    api_key="<YOUR_API_KEY>",
    alternative_url="https://custom.regolo-server.com"
)

Reusing HTTP Client

import httpx

# Create a persistent HTTP client
http_client = httpx.Client()

# Reuse across multiple RegoloClient instances
client1 = regolo.RegoloClient(pre_existent_client=http_client)
client2 = regolo.RegoloClient(pre_existent_client=http_client)

# Don't forget to close when done
http_client.close()

Working with Full Response Objects

client = regolo.RegoloClient()

# Get full API response
response = client.run_chat(
    user_prompt="Hello",
    full_output=True
)

print(response)  # Full API response dict
# {
#   "choices": [...],
#   "usage": {...},
#   "model": "...",
#   ...
# }

Environment Variables

Default Values

Configure default settings via environment variables:

API_KEY

Set your default API key:

export API_KEY="<YOUR_API_KEY>"

Load in Python:

import regolo
regolo.key_load_from_env_if_exists()
# Now regolo.default_key is set

LLM

Set your default chat model:

export LLM="Llama-3.3-70B-Instruct"

Load in Python:

import regolo
regolo.default_chat_model_load_from_env_if_exists()
# Now regolo.default_chat_model is set

IMAGE_MODEL

Set your default image generation model:

export IMAGE_MODEL="Qwen-Image"

Load in Python:

import regolo
regolo.default_image_load_from_env_if_exists()
# Now regolo.default_image_generation_model is set

EMBEDDER_MODEL

Set your default embedder model:

export EMBEDDER_MODEL="gte-Qwen2"

Load in Python:

import regolo
regolo.default_embedder_load_from_env_if_exists()
# Now regolo.default_embedder_model is set

Load All Defaults

Load all default environment variables at once:

import regolo
regolo.try_loading_from_env()

Endpoints

Configure API endpoints (usually not needed):

REGOLO_URL

export REGOLO_URL="https://api.regolo.ai"

COMPLETIONS_URL_PATH

export COMPLETIONS_URL_PATH="/v1/completions"

CHAT_COMPLETIONS_URL_PATH

export CHAT_COMPLETIONS_URL_PATH="/v1/chat/completions"

IMAGE_GENERATION_URL_PATH

export IMAGE_GENERATION_URL_PATH="/v1/images/generations"

EMBEDDINGS_URL_PATH

export EMBEDDINGS_URL_PATH="/v1/embeddings"

AUDIO_TRANSCRIPTION_URL_PATH

export AUDIO_TRANSCRIPTION_URL_PATH="/v1/audio/transcriptions"

RERANK_URL_PATH

export RERANK_URL_PATH="/v1/rerank"

[!TIP] Endpoint environment variables can be changed during execution since the client works directly with them. However, you typically won't need to change these as they're tied to the official Regolo API structure.

Complete Examples

Multi-Model Workflow

import regolo
from io import BytesIO
from PIL import Image

# Configure defaults
regolo.default_key = "<YOUR_API_KEY>"
regolo.default_chat_model = "Llama-3.3-70B-Instruct"
regolo.default_image_generation_model = "Qwen-Image"
regolo.default_embedder_model = "gte-Qwen2"

# 1. Chat about a topic
client = regolo.RegoloClient()
response = client.run_chat(user_prompt="Describe a futuristic city")
description = response[1] if isinstance(response, tuple) else response

print(f"Description: {description}")

# 2. Generate image based on description
img_client = regolo.RegoloClient()
img_bytes = img_client.create_image(
    prompt=description[:500],  # Use first 500 chars
    n=1,
    quality="hd"
)[0]

# Save image
image = Image.open(BytesIO(img_bytes))
image.save("futuristic_city.png")
print("Image saved!")

# 3. Create embeddings for search
emb_client = regolo.RegoloClient()
texts = [
    "futuristic city with flying cars",
    "modern urban landscape",
    "ancient historical architecture"
]
embeddings = emb_client.embeddings(input_text=texts)
print(f"Generated {len(embeddings)} embeddings")

# 4. Rerank documents by relevance
rerank_client = regolo.RegoloClient(reranker_model="jina-reranker-v2")
results = rerank_client.rerank(
    query="futuristic technology",
    documents=texts,
    top_n=2
)

print("\nMost relevant documents:")
for result in results:
    print(f"  {result['relevance_score']:.4f}: {texts[result['index']]}")

Batch Processing

import regolo

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_chat_model = "Llama-3.3-70B-Instruct"

client = regolo.RegoloClient()

# Process multiple prompts
prompts = [
    "Summarize machine learning in one sentence",
    "Explain quantum computing briefly",
    "What is blockchain technology?"
]

responses = []
for prompt in prompts:
    role, content = client.run_chat(user_prompt=prompt)
    responses.append(content)
    client.clear_conversations()  # Start fresh for next prompt

# Display results
for i, (prompt, response) in enumerate(zip(prompts, responses), 1):
    print(f"\n{i}. {prompt}")
    print(f"   Answer: {response[:100]}...")

Audio Processing Pipeline

import regolo
import os

regolo.default_key = "<YOUR_API_KEY>"
regolo.default_audio_transcription_model = "faster-whisper-large-v3"
regolo.default_chat_model = "Llama-3.3-70B-Instruct"

# 1. Transcribe audio
audio_client = regolo.RegoloClient()
transcription = audio_client.audio_transcription(
    file="meeting_recording.mp3",
    language="en",
    response_format="json"
)

print("Transcription:", transcription)

# 2. Summarize transcription
chat_client = regolo.RegoloClient()
summary = chat_client.run_chat(
    user_prompt=f"Summarize this meeting transcript: {transcription}"
)

print("\nSummary:", summary[1])

# 3. Extract action items
action_items = chat_client.run_chat(
    user_prompt="List the action items from this meeting as bullet points"
)

print("\nAction Items:", action_items[1])

Model Management Automation

from regolo.cli import ModelManagementClient
import time

def deploy_model_workflow(model_name, hf_url, gpu_preference="ECS1GPU11"):
    """Complete workflow to deploy a HuggingFace model"""

    client = ModelManagementClient()

    # 1. Authenticate
    print("Authenticating...")
    client.authenticate("username", "password")

    # 2. Register model
    print(f"Registering model: {model_name}")
    try:
        client.register_model(
            name=model_name,
            is_huggingface=True,
            url=hf_url
        )
        print("✓ Model registered")
    except Exception as e:
        if "already exists" in str(e):
            print("⚠ Model already registered")
        else:
            raise

    # 3. Check GPU availability
    print("Checking GPU availability...")
    gpus = client.get_available_gpus()

    available_gpu = None
    for gpu in gpus['gpus']:
        if gpu['InstanceType'] == gpu_preference:
            available_gpu = gpu_preference
            break

    if not available_gpu and gpus['gpus']:
        available_gpu = gpus['gpus'][0]['InstanceType']

    if not available_gpu:
        raise Exception("No GPUs available")

    print(f"Using GPU: {available_gpu}")

    # 4. Load model for inference
    print("Loading model for inference...")
    vllm_config = {
        "max_model_len": 2048,
        "gpu_memory_utilization": 0.85,
        "tensor_parallel_size": 1
    }

    result = client.load_model_for_inference(
        model_name=model_name,
        gpu=available_gpu,
        vllm_config=vllm_config
    )

    if result.get('success'):
        print("✓ Model loading initiated")
    else:
        print("⚠ Model may already be loaded")

    # 5. Wait and verify
    print("Waiting for model to load (60s)...")
    time.sleep(60)

    loaded = client.get_loaded_models()
    model_loaded = any(
        m['model_name'] == model_name
        for m in loaded.get('loaded_models', [])
    )

    if model_loaded:
        print("✓ Model successfully loaded and ready for inference")
    else:
        print("⚠ Model not yet loaded, may need more time")

    # 6. Show status
    status = client.get_user_inference_status()
    print(f"\nCurrent status:")
    print(f"  Active models: {loaded.get('total', 0)}")
    print(f"  Month cost: €{status.get('total_cost', 0):.2f}")

    return client

# Usage
client = deploy_model_workflow(
    model_name="my-gpt2",
    hf_url="gpt2",
    gpu_preference="ECS1GPU11"
)

Cost Monitoring Script

from regolo.cli import ModelManagementClient
from datetime import datetime
import time

def monitor_costs_continuously(threshold_eur=100, check_interval=3600):
    """Monitor costs and alert when threshold exceeded"""

    client = ModelManagementClient()
    client.authenticate("username", "password")

    print(f"Monitoring costs. Alert threshold: €{threshold_eur}")
    print(f"Check interval: {check_interval}s")

    while True:
        try:
            # Get current month status
            status = client.get_user_inference_status()
            current_cost = status.get('total_cost', 0)

            print(f"\n[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}]")
            print(f"Current month cost: €{current_cost:.2f}")

            # Check loaded models
            loaded = client.get_loaded_models()
            if loaded['loaded_models']:
                print(f"Active models: {loaded['total']}")
                for model in loaded['loaded_models']:
                    print(f"  - {model['model_name']}: €{model['cost']:.2f}")

            # Alert if threshold exceeded
            if current_cost >= threshold_eur:
                print(f"\n⚠️  ALERT: Cost threshold exceeded!")
                print(f"Current: €{current_cost:.2f} / Threshold: €{threshold_eur}")

                # List recommendations
                if loaded['loaded_models']:
                    print("\nConsider unloading these models:")
                    for model in loaded['loaded_models']:
                        print(f"  - {model['model_name']} (Session: {model['session_id']})")

                break

            # Sleep until next check
            time.sleep(check_interval)

        except KeyboardInterrupt:
            print("\n\nMonitoring stopped by user")
            break
        except Exception as e:
            print(f"Error during monitoring: {e}")
            time.sleep(60)  # Wait a minute before retrying

# Run monitoring (checks every hour)
# monitor_costs_continuously(threshold_eur=100, check_interval=3600)

Best Practices

API Key Security

Never hardcode API keys in your source code
Use environment variables or secure key management
Rotate keys regularly for security

import os
import regolo

# Load key from environment
regolo.default_key = os.getenv("REGOLO_API_KEY")

# Hardcoded key
regolo.default_key = "sk-xxxxxxxxxxxxx"

Error Handling

import regolo
from httpx import HTTPStatusError

regolo.default_key = "<YOUR_API_KEY>"

client = regolo.RegoloClient()

try:
    response = client.run_chat(user_prompt="Hello")
    print(response)
except HTTPStatusError as e:
    if e.response.status_code == 401:
        print("Authentication failed. Check your API key.")
    elif e.response.status_code == 429:
        print("Rate limit exceeded. Please wait before retrying.")
    else:
        print(f"HTTP error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Resource Management

import regolo

# Use context managers when working with files
client = regolo.RegoloClient()

# Audio transcription
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio_transcription(file=audio_file)

# Always clear conversations when starting new topics
client.run_chat(user_prompt="Tell me about Python")
client.clear_conversations()  # Clear before new topic
client.run_chat(user_prompt="Tell me about JavaScript")

Model Naming Conventions

Use descriptive, lowercase names: my-bert-model
Include version numbers: gpt2-v1, llama-2-7b
Avoid special characters except hyphens and underscores
Keep names under 200 characters
Don't use GitLab reserved names

Troubleshooting

Common Issues

Authentication Errors

# Problem: "API key is required"
# Solution: Set the API key
import regolo
regolo.default_key = "<YOUR_API_KEY>"

# Or pass directly
client = regolo.RegoloClient(api_key="<YOUR_API_KEY>")

Model Not Found

# Problem: Model not found when loading for inference
# Solution: Register the model first
regolo models register --name my-model --type huggingface --url gpt2

SSH Authentication Failed

# Problem: Cannot push to GitLab repository
# Solution: Add your SSH key
regolo ssh add --title "My Key" --key-file ~/.ssh/id_rsa.pub

# Test SSH connection
ssh -T git@gitlab.regolo.ai

Model Loading Timeout

# Problem: Model takes too long to load
# Solution: Large models need time, check status periodically
regolo inference status

# Wait and check again
sleep 60
regolo inference status

Getting Help

For additional support:

Check the API documentation
View CLI help: regolo --help or regolo <command> --help
Contact Regolo support through your organization

API Reference

RegoloClient Methods

Method	Description
`completions(prompt, stream, max_tokens, ...)`	Generate text completion
`run_chat(user_prompt, stream, max_tokens, ...)`	Run chat completion
`add_prompt_to_chat(prompt, role)`	Add message to conversation
`clear_conversations()`	Clear conversation history
`create_image(prompt, n, quality, size, ...)`	Generate images
`audio_transcription(file, language, ...)`	Transcribe audio
`embeddings(input_text)`	Generate text embeddings
`rerank(query, documents, top_n, ...)`	Rerank documents by relevance
`change_model(model)`	Switch to different model

Static Methods

Method	Description
`static_completions(prompt, model, api_key, ...)`	Static completion method
`static_chat_completions(messages, model, ...)`	Static chat method
`static_image_create(prompt, model, ...)`	Static image generation
`static_audio_transcription(file, model, ...)`	Static transcription
`static_embeddings(input_text, model, ...)`	Static embeddings
`get_available_models(api_key, model_info)`	List available models

CLI Commands Reference

Command	Description
`regolo auth login`	Authenticate with credentials
`regolo auth logout`	Clear authentication tokens
`regolo models register`	Register a new model
`regolo models list`	List registered models
`regolo models details <name>`	Get model details
`regolo models delete <name>`	Delete a model
`regolo ssh add`	Add SSH key
`regolo ssh list`	List SSH keys
`regolo ssh delete <id>`	Delete SSH key
`regolo inference gpus`	List available GPUs
`regolo inference load <model>`	Load model for inference
`regolo inference unload`	Unload model
`regolo inference status`	Show loaded models
`regolo inference user-status`	Show cost/billing info
`regolo chat`	Interactive chat
`regolo get-available-models`	List API models
`regolo create-image`	Generate images
`regolo transcribe-audio`	Transcribe audio
`regolo rerank`	Rerank documents

Version Information

Client Version: Check with pip show regolo
API Version: 1.0.0
Python Requirement: >= 3.8
Management API Base URL: https://devmid.regolo.ai
Inference API Base URL: https://api.regolo.ai

For more information, visit the Regolo.ai documentation or contact support through your organization.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.10.0

Mar 2, 2026

1.9.1

Nov 18, 2025

1.9.0

Nov 18, 2025

1.8.1

Nov 11, 2025

1.8.0

Nov 4, 2025

1.7.3

Oct 16, 2025

1.7.2

Oct 7, 2025

1.7.1

Sep 25, 2025

1.7.0

Sep 25, 2025

1.6.0

Sep 3, 2025

1.5.0

Aug 26, 2025

1.2.0

Mar 30, 2025

1.1.0

Mar 14, 2025

1.0.3

Feb 10, 2025

1.0.2

Feb 4, 2025

1.0.1

Feb 3, 2025

1.0.0

Feb 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regolo-1.10.0.tar.gz (56.4 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

regolo-1.10.0-py3-none-any.whl (37.8 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file regolo-1.10.0.tar.gz.

File metadata

Download URL: regolo-1.10.0.tar.gz
Upload date: Mar 2, 2026
Size: 56.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for regolo-1.10.0.tar.gz
Algorithm	Hash digest
SHA256	`15cc1f5e4273d70d411e2250e9bcb47af8f215d5e1ae8dcf29c69358c1c4c093`
MD5	`1d7d386799ac657cd4b0461646a82140`
BLAKE2b-256	`603e49f40c64666a7891fa17114f55c78df6d6db3b1318383da2b64a6a862cd9`

See more details on using hashes here.

File details

Details for the file regolo-1.10.0-py3-none-any.whl.

File metadata

Download URL: regolo-1.10.0-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 37.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for regolo-1.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70edbd9bc2c315d264808514f9adc454050c7a730c2c8c31d4d047d9d71488b7`
MD5	`1d74829884b0b8081d51b3dbc421926d`
BLAKE2b-256	`4b87f3e37fd44d6ac8b1b60b0d54f08e1dd08addaf0ae2591f35ae381e5b19ab`

See more details on using hashes here.

regolo 1.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Regolo.ai Python Client

Table of Contents

Installation

Basic Usage

1. Import the regolo module

2. Set Up Default API Key and Model

Chat and Completions

Text Completion:

Chat Completion:

Handling Conversation History:

Streaming Responses:

Image Generation

Audio Transcription

Text Embeddings

Document Reranking

CLI Usage

Chat Interface

Model Management

Authentication

List Available Models

Register Models

List Registered Models

Get Model Details

Delete a Model

Inference Management

View Available GPUs

Load Model for Inference

View Loaded Models

Unload Model

Monitor Costs

SSH Key Management

Add SSH Key

List SSH Keys

Delete SSH Key

Complete Workflow Command

Other CLI Commands

Create Images

Transcribe Audio

Rerank Documents

Model Management & Deployment

Authentication

Registering Models

HuggingFace Models

Custom Models

Loading Models for Inference

Monitoring and Billing

View Loaded Models

Monitor Costs

Unload Models

Complete Workflow Example

Advanced Usage

Switching Models

Managing Conversation State

Custom Base URL

Reusing HTTP Client

Working with Full Response Objects

Environment Variables

Default Values

API_KEY

LLM

IMAGE_MODEL

EMBEDDER_MODEL

Load All Defaults

Endpoints

REGOLO_URL

COMPLETIONS_URL_PATH

CHAT_COMPLETIONS_URL_PATH

IMAGE_GENERATION_URL_PATH

EMBEDDINGS_URL_PATH

AUDIO_TRANSCRIPTION_URL_PATH

RERANK_URL_PATH