Skip to main content

Client to interact with COE AI-hosted LLM models

Project description

coeai LLM Inference API Client

Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.

coeai is a comprehensive Python package designed for seamless LLM inference over LAN to the UPES Wi-Fi, supporting text-to-text and image-to-text operations with advanced streaming features.


Table of Contents


Features

Feature Description
Text-to-Text Support for all available LLMs
Image-to-Text Multimodal support with Llama4 models
Streaming Real-time response streaming
Custom Messages Advanced conversation handling
Multiple Images Process multiple images per request
Parameter Control Full generation parameter customization
LAN Optimized FastAPI deployment over local UPES network

Installation

pip install coeai

Quick Start

from coeai import LLMinfer

# Initialize the client
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")

# Simple text generation
response = llm.generate(
    model="llama4-16x17b",
    prompt="Explain quantum computing in simple terms.",
    max_tokens=256
)
print(response)

API Reference

LLMinfer Class

Initialization

LLMinfer(api_key: str, host: str = "http://127.0.0.1:8001")
Parameter Type Description
api_key str Your API authentication key
host str The FastAPI server endpoint URL

Methods

generate()
generate(
    model: str,
    inference_type: str = "text-to-text",
    prompt: Optional[str] = None,
    messages: Optional[List[Dict]] = None,
    files: Optional[List[str]] = None,
    max_tokens: int = 512,
    temperature: float = 0.7,
    top_p: float = 1.0,
    stream: bool = False,
    print_stream: bool = True
) -> Dict
Parameter Type Description
model str Model name (e.g., "llama4-16x17b")
inference_type str "text-to-text" or "image-to-text"
prompt str (optional) Text prompt for generation
messages list (optional) Custom conversation messages
files list (optional) List of image file paths
max_tokens int Maximum number of tokens to generate
temperature float Sampling temperature (0.0–2.0)
top_p float Nucleus sampling parameter
stream bool Enable streaming response
print_stream bool Print streaming output to console
Returns Dict API response dictionary

Available Models

Text-Only Models

  • tinyllama-latest: Compact model for basic tasks
  • tinyllama-1.1b: Small efficient model
  • deepseek-r1-70b: Advanced reasoning model
  • gpt-oss-120b: Large general-purpose model

Multimodal Models (Image + Text)

  • llama4-16x17b: Recommended for image-to-text inference

Usage Examples

1. Basic Text Generation

from coeai import LLMinfer
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")
response = llm.generate(
    model="tinyllama-latest",
    inference_type="text-to-text",
    prompt="Write a short story about a robot learning to paint.",
    max_tokens=256,
    temperature=0.7,
    top_p=1.0
)
print(response)

2. Custom Conversation Messages

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [{"type": "text", "text": "Explain quantum computing in simple terms."}]}
]
response = llm.generate(
    model="llama4-16x17b",
    inference_type="text-to-text",
    messages=messages,
    max_tokens=512,
    temperature=0.6
)
print(response)

3. Single Image Analysis

response = llm.generate(
    model="llama4-16x17b",
    inference_type="image-to-text",
    files=["/path/to/image.jpeg"],
    prompt="Describe this image in detail.",
    max_tokens=512,
    temperature=0.7
)
print(response)

4. Multiple Image Comparison

image_paths = [
    "/Users/coe-ai/Downloads/image1.jpeg",
    "/Users/coe-ai/Downloads/image2.jpeg"
]
response = llm.generate(
    model="llama4-16x17b",
    inference_type="image-to-text",
    files=image_paths,
    prompt="Compare these images and describe similarities and differences.",
    max_tokens=512,
    temperature=0.7,
    top_p=1.0
)
print(response)

5. Streaming Text Generation

response = llm.generate(
    model="tinyllama-latest",
    inference_type="text-to-text",
    prompt="Tell a story about AI and creativity.",
    max_tokens=300,
    temperature=0.8,
    stream=True,
    print_stream=True
)
print("\nFinal response:", response)

6. Advanced Parameters

response = llm.generate(
    model="deepseek-r1-70b",
    inference_type="text-to-text",
    prompt="Solve this math problem step by step: What is 2^10 * 3^5?",
    max_tokens=400,
    temperature=0.1,
    top_p=0.9,
    stream=False
)
print(response)

cURL Commands Reference

List Available Models

curl -X GET http://10.9.6.165:8001/models \\
  -H "X-API-Key: your-api-key"

Text-to-Text Inference

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=tinyllama-latest" \\
  -F "inference_type=text-to-text" \\
  -F "prompt=Write a short story about a robot learning to paint." \\
  -F "max_tokens=256" \\
  -F "temperature=0.7" \\
  -F "top_p=1.0"

Text-to-Text with Custom Messages

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4-16x17b" \\
  -F "inference_type=text-to-text" \\
  -F 'messages=[{"role":"user","content":[{"type":"text","text":"Explain quantum computing in simple terms."}]}]' \\
  -F "max_tokens=512" \\
  -F "temperature=0.6"

Image-to-Text Inference (Single Image)

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4-16x17b" \\
  -F "inference_type=image-to-text" \\
  -F "prompt=Describe the contents of this image" \\
  -F "files=@/Users/coe-ai/Downloads/image.jpeg" \\
  -F "max_tokens=512" \\
  -F "temperature=0.7"

Image-to-Text Inference (Multiple Images)

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4-16x17b" \\
  -F "inference_type=image-to-text" \\
  -F "prompt=Compare these two images and describe similarities and differences" \\
  -F "files=@/Users/coe-ai/Downloads/image1.jpeg" \\
  -F "files=@/Users/coe-ai/Downloads/image2.jpeg" \\
  -F "max_tokens=512" \\
  -F "temperature=0.7"

Streaming Response

curl -N -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=tinyllama-latest" \\
  -F "inference_type=text-to-text" \\
  -F "prompt=Write a motivational quote about persistence." \\
  -F "stream=true"

Note: The -N flag ensures curl doesn't buffer the streaming response.

Test Files

Create these test files to validate all functionality:

test_text_prompt.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key, host="http://10.9.6.165:8001")

response = llm.generate(
    model="llama4-16x17b",
    inference_type="text-to-text",
    prompt="Explain the difference between supervised and unsupervised learning.",
    max_tokens=256,
    temperature=0.5,
    top_p=0.9,
    stream=False
)
print(response)

test_custom_messages.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key)

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [{"type": "text", "text": "Give a short summary of COVID-19 impact."}]}
]

response = llm.generate(
    model="llama4-16x17b",
    inference_type="text-to-text",
    messages=messages,
    max_tokens=300,
    temperature=0.6,
    top_p=0.95
)
print(response)

test_streaming_text.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key)

response = llm.generate(
    model="llama4-16x17b",
    inference_type="text-to-text",
    prompt="Write a 5-line poem about AI.",
    max_tokens=150,
    temperature=0.8,
    top_p=0.9,
    stream=True,
    print_stream=True  # Prints partial outputs
)
print("\\nCollected response:", response)

test_image_to_text.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key)

response = llm.generate(
    model="llama4-16x17b",
    inference_type="image-to-text",
    files=["/Users/coe-ai/Downloads/image.jpeg"],
    prompt="Describe this image in detail",
    max_tokens=512
)
print(response)

test_multiple_images.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key)

image_paths = [
    "/Users/coe-ai/Downloads/image1.jpeg",
    "/Users/coe-ai/Downloads/image2.jpeg"
]

response = llm.generate(
    model="llama4-16x17b",
    inference_type="image-to-text",
    files=image_paths,
    prompt="Compare the images and describe similarities and differences",
    max_tokens=512,
    temperature=0.7,
    top_p=1.0
)
print(response)

test_all_parameters.py

from coeai import LLMinfer

api_key = "your-api-key"
llm = LLMinfer(api_key)

# Test with all available parameters
response = llm.generate(
    model="deepseek-r1-70b",
    inference_type="text-to-text",
    prompt="Write a technical explanation of blockchain technology.",
    max_tokens=400,
    temperature=0.3,  # Low temperature for technical accuracy
    top_p=0.85,
    stream=False
)
print(response)

Error Handling

The client provides detailed error handling:

try:
    response = llm.generate(
        model="llama4-16x17b",
        inference_type="image-to-text",
        files=["nonexistent.jpg"],
        prompt="Describe this image"
    )
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e.response.json()}")
except Exception as e:
    print(f"Error: {str(e)}")

Performance Tips

  1. Model Selection: Use llama4-16x17b for image-to-text to avoid memory issues
  2. Temperature Settings: Lower values (0.1-0.3) for factual/technical content, higher (0.7-1.0) for creative tasks
  3. Token Limits: Set appropriate max_tokens to balance response quality and generation time
  4. Streaming: Use streaming for long responses to see progress in real-time

Troubleshooting

Common Issues

  1. 400 Bad Request: Check model name and inference type compatibility
  2. 401 Unauthorized: Verify API key is correct
  3. 500 Internal Server Error: Usually indicates insufficient GPU memory for large models
  4. Connection Refused: Ensure from COE AI that the FastAPI server is running and accessible

Debug Mode

Enable detailed error reporting:

try:
    response = llm.generate(...)
except requests.exceptions.HTTPError as e:
    print("Detailed error:", e.response.json())

Authentication: API key

All requests must include an API key issued by the COE AI team. Pass the key when constructing LLMinfer (it is added as an Authorization header behind the scenes).

Requesting an API Key

  1. Send an email to hpc-access@ddn.upes.ac.in from your official UPES account using this template:
Subject: API Key Request for COE AI LLM Access

Dear COE AI Team,

I am requesting access to the LLM API for my project work.

Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>

Reason for API Access:
<Research objectives or academic requirements>

Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>

Thank you for considering my request.

Best regards,
<Your Name>
  1. Allow 2-3 business days for processing. The team will reply with your API key.

License

coeai is released under the MIT License.


Changelog

v2.1.0

  • Production Release
  • Text-to-text and image-to-text inference
  • Streaming support
  • Multiple image processing
  • Comprehensive parameter control
  • Full cURL command compatibility

Authors

Konal Puri Sawai Pratap Khatri Centre of Excellence: AI (COE AI), HPC Project, UPES.

PyPI: https://pypi.org/project/coeai GitHub: https://github.com/pkonal23/COE-AI-HPC-Project.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coeai-2.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coeai-2.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file coeai-2.1.0.tar.gz.

File metadata

  • Download URL: coeai-2.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-2.1.0.tar.gz
Algorithm Hash digest
SHA256 af758b388b4debe7626e3b358253a622b2fd34fadca37c2602847ca2c1299d2c
MD5 86bee01040d8efc30241d4b775c1660b
BLAKE2b-256 8a64e6b49ececfcc46c2666f1cd4bb6ff993f125e2db479b9af9f3046d9ccdec

See more details on using hashes here.

File details

Details for the file coeai-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: coeai-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 601e9219783b729a8bfbb21e9c4835dd064c7aeabc659a11685d8d161d9fbb51
MD5 1dc9e5e7e45c7841d80a6edb07d2eca0
BLAKE2b-256 408063ffd8db5507d1629e59547a1bf7cddd280f72418113e3d6b364b5c74ce0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page