Skip to main content

Client to interact with COE AI-hosted LLM models

Project description

coeai LLM Inference API Client

Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.

coeai is a comprehensive Python package designed for seamless LLM inference over LAN to the UPES Wi-Fi, supporting text-to-text and image-to-text operations with advanced streaming features.


Table of Contents


Features

Feature Description
Text-to-Text Support for all available LLMs
Image-to-Text Multimodal support with Llama4 models
Streaming Real-time response streaming
Custom Messages Advanced conversation handling
Multiple Images Process multiple images per request
Parameter Control Full generation parameter customization
LAN Optimized FastAPI deployment over local UPES network

Installation

pip install coeai

Quick Start

from coeai import LLMinfer

# Initialize the client
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")

# Simple text generation
response = llm.generate(
    model="llama4:16x17b",
    prompt="Explain quantum computing in simple terms.",
    max_tokens=256
)
print(response)

API Reference

LLMinfer Class

Initialization

LLMinfer(api_key: str, host: str)
Parameter Type Description
api_key str Your API authentication key
host str The FastAPI server endpoint URL

Methods

generate()
generate(
    model: str,
    inference_type: str = "text-to-text",
    prompt: Optional[str] = None,
    messages: Optional[List[Dict]] = None,
    files: Optional[List[str]] = None,
    max_tokens: int = 512,
    temperature: float = 0.7,
    top_p: float = 1.0,
    stream: bool = False,
    print_stream: bool = True
) -> Dict
Parameter Type Description
model str Model name (e.g., "llama4-16x17b")
inference_type str "text-to-text" or "image-to-text"
prompt str (optional) Text prompt for generation
messages list (optional) Custom conversation messages
files list (optional) List of image file paths
max_tokens int Maximum number of tokens to generate
temperature float Sampling temperature (0.0–2.0)
top_p float Nucleus sampling parameter
stream bool Enable streaming response
print_stream bool Print streaming output to console
Returns Dict API response dictionary

Available Models

Text-Only Models

  • tinyllama:latest: Compact model for basic tasks
  • tinyllama:1.1b: Small efficient model
  • deepseek-r1:70b: Advanced reasoning model
  • gpt-oss:120b: Large general-purpose model

Multimodal Models (Image + Text)

  • llama4:16x17b: Recommended for image-to-text inference

Usage Examples

1. Basic Text Generation

from coeai import LLMinfer
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")
response = llm.generate(
    model="tinyllama:latest",
    inference_type="text-to-text",
    prompt="Write a short story about a robot learning to paint.",
    max_tokens=256,
    temperature=0.7,
    top_p=1.0
)
print(response)

2. Custom Conversation Messages

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [{"type": "text", "text": "Explain quantum computing in simple terms."}]}
]
response = llm.generate(
    model="llama4:16x17b",
    inference_type="text-to-text",
    messages=messages,
    max_tokens=512,
    temperature=0.6
)
print(response)

3. Single Image Analysis

response = llm.generate(
    model="llama4:16x17b",
    inference_type="image-to-text",
    files=["/path/to/image.jpeg"],
    prompt="Describe this image in detail.",
    max_tokens=512,
    temperature=0.7
)
print(response)

4. Multiple Image Comparison

image_paths = [
    "/Users/coe-ai/Downloads/image1.jpeg",
    "/Users/coe-ai/Downloads/image2.jpeg"
]
response = llm.generate(
    model="llama4:16x17b",
    inference_type="image-to-text",
    files=image_paths,
    prompt="Compare these images and describe similarities and differences.",
    max_tokens=512,
    temperature=0.7,
    top_p=1.0
)
print(response)

5. Streaming Text Generation

response = llm.generate(
    model="tinyllama:latest",
    inference_type="text-to-text",
    prompt="Tell a story about AI and creativity.",
    max_tokens=300,
    temperature=0.8,
    stream=True,
    print_stream=True
)
print("\nFinal response:", response)

6. Advanced Parameters

response = llm.generate(
    model="deepseek-r1:70b",
    inference_type="text-to-text",
    prompt="Solve this math problem step by step: What is 2^10 * 3^5?",
    max_tokens=400,
    temperature=0.1,
    top_p=0.9,
    stream=False
)
print(response)

Error Handling

The client provides detailed error handling:

try:
    response = llm.generate(
        model="llama4:16x17b",
        inference_type="image-to-text",
        files=["nonexistent.jpg"],
        prompt="Describe this image"
    )
except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e.response.json()}")
except Exception as e:
    print(f"Error: {str(e)}")

cURL Commands Reference

List Available Models

curl -X GET http://10.9.6.165:8001/models \\ -H "X-API-Key: your-api-key"

Text-to-Text Inference

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=tinyllama:latest" \\
  -F "inference_type=text-to-text" \\
  -F "prompt=Write a short story about a robot learning to paint." \\
  -F "max_tokens=256" \\
  -F "temperature=0.7" \\
  -F "top_p=1.0"

Text-to-Text with Custom Messages

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4:16x17b" \\
  -F "inference_type=text-to-text" \\
  -F 'messages=[{"role":"user","content":[{"type":"text","text":"Explain quantum computing in simple terms."}]}]' \\
  -F "max_tokens=512" \\
  -F "temperature=0.6"

Image-to-Text Inference (Single Image)

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4:16x17b" \\
  -F "inference_type=image-to-text" \\
  -F "prompt=Describe the contents of this image" \\
  -F "files=@/Users/coe-ai/Downloads/image.jpeg" \\
  -F "max_tokens=512" \\
  -F "temperature=0.7"

Image-to-Text Inference (Multiple Images)

curl -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=llama4:16x17b" \\
  -F "inference_type=image-to-text" \\
  -F "prompt=Compare these two images and describe similarities and differences" \\
  -F "files=@/Users/coe-ai/Downloads/image1.jpeg" \\
  -F "files=@/Users/coe-ai/Downloads/image2.jpeg" \\
  -F "max_tokens=512" \\
  -F "temperature=0.7"

Streaming Response

curl -N -X POST http://10.9.6.165:8001/generate \\
  -H "X-API-Key: your-api-key" \\
  -F "model=tinyllama:latest" \\
  -F "inference_type=text-to-text" \\
  -F "prompt=Write a motivational quote about persistence." \\
  -F "stream=true"

Note: The -N flag ensures curl doesn't buffer the streaming response.

Performance Tips

  1. Model Selection: Use llama4:16x17b for image-to-text to avoid memory issues
  2. Temperature Settings: Lower values (0.1-0.3) for factual/technical content, higher (0.7-1.0) for creative tasks
  3. Token Limits: Set appropriate max_tokens to balance response quality and generation time
  4. Streaming: Use streaming for long responses to see progress in real-time

Troubleshooting

Common Issues

  1. 400 Bad Request: Check model name and inference type compatibility
  2. 401 Unauthorized: Verify API key is correct
  3. 500 Internal Server Error: Usually indicates insufficient GPU memory for large models
  4. Connection Refused: Ensure from COE AI that the FastAPI server is running and accessible

Debug Mode

Enable detailed error reporting:

try:
    response = llm.generate(...)
except requests.exceptions.HTTPError as e:
    print("Detailed error:", e.response.json())

Authentication: API key

All requests must include an API key issued by the COE AI team. Pass the key when constructing LLMinfer (it is added as an Authorization header behind the scenes).

Requesting an API Key

  1. Send an email to hpc-access@ddn.upes.ac.in from your official UPES account using this template:
Subject: API Key Request for COE AI LLM Access

Dear COE AI Team,

I am requesting access to the LLM API for my project work.

Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>

Reason for API Access:
<Research objectives or academic requirements>

Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>

Thank you for considering my request.

Best regards,
<Your Name>
  1. Allow 2-3 business days for processing. The team will reply with your API key.

License

coeai is released under the MIT License.


Changelog

v3.0.0

  • Production Release
  • Text-to-text and image-to-text inference
  • Streaming support
  • Multiple image processing
  • Comprehensive parameter control
  • Full cURL command compatibility

Authors

Konal Puri
Sawai Pratap Khatri
Centre of Excellence: AI (COE AI), HPC Project, UPES.

PyPI: https://pypi.org/project/coeai
GitHub: https://github.com/pkonal23/COE-AI-HPC-Project.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coeai-3.0.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coeai-3.0.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file coeai-3.0.0.tar.gz.

File metadata

  • Download URL: coeai-3.0.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-3.0.0.tar.gz
Algorithm Hash digest
SHA256 c2855394ab8f23c425a49d4dc2c449112a1644d413116f715e01a23f9aa046f7
MD5 f436ff8c46c6dde94bf54360d052a022
BLAKE2b-256 014c1733f4ca43ef1439aeee726b28f2ec5b4392f50a7f0e30586f5bef5692d2

See more details on using hashes here.

File details

Details for the file coeai-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: coeai-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 403c69c053533c6d1ac2291be4cd8565013866ba2dafd4f60f2e6808ea6288d0
MD5 d35d92dd9d915aa6d98260eacd76eac2
BLAKE2b-256 44021d704e5078d0c47b5b2b4cd320487e793283f8b9901960cc91f9be4f5a43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page