Client to interact with COE AI-hosted LLM models
Project description
coeai LLM Inference API Client
Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.
coeai is a comprehensive Python package designed for seamless LLM inference over LAN to the UPES Wi-Fi, supporting text-to-text and image-to-text operations with advanced streaming features.
Table of Contents
- Features
- Installation
- Quick Start
- API Reference
- Available Models
- Usage Examples
- Error Handling
- cURL Commands Reference
- Performance Tips
- Troubleshooting
- Authentication: API key
- License
- Changelog
- Authors
Features
| Feature | Description |
|---|---|
| Text-to-Text | Support for all available LLMs |
| Image-to-Text | Multimodal support with Llama4 models |
| Streaming | Real-time response streaming |
| Custom Messages | Advanced conversation handling |
| Multiple Images | Process multiple images per request |
| Parameter Control | Full generation parameter customization |
| LAN Optimized | FastAPI deployment over local UPES network |
Installation
pip install coeai
Quick Start
from coeai import LLMinfer
# Initialize the client
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")
# Simple text generation
response = llm.generate(
model="llama4:16x17b",
prompt="Explain quantum computing in simple terms.",
max_tokens=256
)
print(response)
API Reference
LLMinfer Class
Initialization
LLMinfer(api_key: str, host: str)
| Parameter | Type | Description |
|---|---|---|
| api_key | str | Your API authentication key |
| host | str | The FastAPI server endpoint URL |
Methods
generate()
generate(
model: str,
inference_type: str = "text-to-text",
prompt: Optional[str] = None,
messages: Optional[List[Dict]] = None,
files: Optional[List[str]] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 1.0,
stream: bool = False,
print_stream: bool = True
) -> Dict
| Parameter | Type | Description |
|---|---|---|
| model | str | Model name (e.g., "llama4-16x17b") |
| inference_type | str | "text-to-text" or "image-to-text" |
| prompt | str (optional) | Text prompt for generation |
| messages | list (optional) | Custom conversation messages |
| files | list (optional) | List of image file paths |
| max_tokens | int | Maximum number of tokens to generate |
| temperature | float | Sampling temperature (0.0–2.0) |
| top_p | float | Nucleus sampling parameter |
| stream | bool | Enable streaming response |
| print_stream | bool | Print streaming output to console |
| Returns | Dict | API response dictionary |
Available Models
Text-Only Models
tinyllama:latest: Compact model for basic taskstinyllama:1.1b: Small efficient modeldeepseek-r1:70b: Advanced reasoning modelgpt-oss:120b: Large general-purpose model
Multimodal Models (Image + Text)
llama4:16x17b: Recommended for image-to-text inference
Usage Examples
1. Basic Text Generation
from coeai import LLMinfer
llm = LLMinfer(api_key="your-api-key", host="http://10.9.6.165:8001")
response = llm.generate(
model="tinyllama:latest",
inference_type="text-to-text",
prompt="Write a short story about a robot learning to paint.",
max_tokens=256,
temperature=0.7,
top_p=1.0
)
print(response)
2. Custom Conversation Messages
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
{"role": "user", "content": [{"type": "text", "text": "Explain quantum computing in simple terms."}]}
]
response = llm.generate(
model="llama4:16x17b",
inference_type="text-to-text",
messages=messages,
max_tokens=512,
temperature=0.6
)
print(response)
3. Single Image Analysis
response = llm.generate(
model="llama4:16x17b",
inference_type="image-to-text",
files=["/path/to/image.jpeg"],
prompt="Describe this image in detail.",
max_tokens=512,
temperature=0.7
)
print(response)
4. Multiple Image Comparison
image_paths = [
"/Users/coe-ai/Downloads/image1.jpeg",
"/Users/coe-ai/Downloads/image2.jpeg"
]
response = llm.generate(
model="llama4:16x17b",
inference_type="image-to-text",
files=image_paths,
prompt="Compare these images and describe similarities and differences.",
max_tokens=512,
temperature=0.7,
top_p=1.0
)
print(response)
5. Streaming Text Generation
response = llm.generate(
model="tinyllama:latest",
inference_type="text-to-text",
prompt="Tell a story about AI and creativity.",
max_tokens=300,
temperature=0.8,
stream=True,
print_stream=True
)
print("\nFinal response:", response)
6. Advanced Parameters
response = llm.generate(
model="deepseek-r1:70b",
inference_type="text-to-text",
prompt="Solve this math problem step by step: What is 2^10 * 3^5?",
max_tokens=400,
temperature=0.1,
top_p=0.9,
stream=False
)
print(response)
Error Handling
The client provides detailed error handling:
try:
response = llm.generate(
model="llama4:16x17b",
inference_type="image-to-text",
files=["nonexistent.jpg"],
prompt="Describe this image"
)
except requests.exceptions.HTTPError as e:
print(f"HTTP Error: {e.response.json()}")
except Exception as e:
print(f"Error: {str(e)}")
cURL Commands Reference
List Available Models
curl -X GET http://10.9.6.165:8001/models \\ -H "X-API-Key: your-api-key"
Text-to-Text Inference
curl -X POST http://10.9.6.165:8001/generate \\
-H "X-API-Key: your-api-key" \\
-F "model=tinyllama:latest" \\
-F "inference_type=text-to-text" \\
-F "prompt=Write a short story about a robot learning to paint." \\
-F "max_tokens=256" \\
-F "temperature=0.7" \\
-F "top_p=1.0"
Text-to-Text with Custom Messages
curl -X POST http://10.9.6.165:8001/generate \\
-H "X-API-Key: your-api-key" \\
-F "model=llama4:16x17b" \\
-F "inference_type=text-to-text" \\
-F 'messages=[{"role":"user","content":[{"type":"text","text":"Explain quantum computing in simple terms."}]}]' \\
-F "max_tokens=512" \\
-F "temperature=0.6"
Image-to-Text Inference (Single Image)
curl -X POST http://10.9.6.165:8001/generate \\
-H "X-API-Key: your-api-key" \\
-F "model=llama4:16x17b" \\
-F "inference_type=image-to-text" \\
-F "prompt=Describe the contents of this image" \\
-F "files=@/Users/coe-ai/Downloads/image.jpeg" \\
-F "max_tokens=512" \\
-F "temperature=0.7"
Image-to-Text Inference (Multiple Images)
curl -X POST http://10.9.6.165:8001/generate \\
-H "X-API-Key: your-api-key" \\
-F "model=llama4:16x17b" \\
-F "inference_type=image-to-text" \\
-F "prompt=Compare these two images and describe similarities and differences" \\
-F "files=@/Users/coe-ai/Downloads/image1.jpeg" \\
-F "files=@/Users/coe-ai/Downloads/image2.jpeg" \\
-F "max_tokens=512" \\
-F "temperature=0.7"
Streaming Response
curl -N -X POST http://10.9.6.165:8001/generate \\
-H "X-API-Key: your-api-key" \\
-F "model=tinyllama:latest" \\
-F "inference_type=text-to-text" \\
-F "prompt=Write a motivational quote about persistence." \\
-F "stream=true"
Note: The -N flag ensures curl doesn't buffer the streaming response.
Performance Tips
- Model Selection: Use
llama4:16x17bfor image-to-text to avoid memory issues - Temperature Settings: Lower values (0.1-0.3) for factual/technical content, higher (0.7-1.0) for creative tasks
- Token Limits: Set appropriate
max_tokensto balance response quality and generation time - Streaming: Use streaming for long responses to see progress in real-time
Troubleshooting
Common Issues
400Bad Request: Check model name and inference type compatibility401Unauthorized: Verify API key is correct500Internal Server Error: Usually indicates insufficient GPU memory for large modelsConnection Refused: Ensure from COE AI that the FastAPI server is running and accessible
Debug Mode
Enable detailed error reporting:
try:
response = llm.generate(...)
except requests.exceptions.HTTPError as e:
print("Detailed error:", e.response.json())
Authentication: API key
All requests must include an API key issued by the COE AI team. Pass the key when constructing LLMinfer (it is added as an Authorization header behind the scenes).
Requesting an API Key
- Send an email to
hpc-access@ddn.upes.ac.infrom your official UPES account using this template:
Subject: API Key Request for COE AI LLM Access
Dear COE AI Team,
I am requesting access to the LLM API for my project work.
Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>
Reason for API Access:
<Research objectives or academic requirements>
Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>
Thank you for considering my request.
Best regards,
<Your Name>
- Allow 2-3 business days for processing. The team will reply with your API key.
License
coeai is released under the MIT License.
Changelog
v3.0.0
- Production Release
- Text-to-text and image-to-text inference
- Streaming support
- Multiple image processing
- Comprehensive parameter control
- Full cURL command compatibility
Authors
Konal Puri
Sawai Pratap Khatri
Centre of Excellence: AI (COE AI), HPC Project, UPES.
PyPI: https://pypi.org/project/coeai
GitHub: https://github.com/pkonal23/COE-AI-HPC-Project.git
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coeai-3.0.0.tar.gz.
File metadata
- Download URL: coeai-3.0.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2855394ab8f23c425a49d4dc2c449112a1644d413116f715e01a23f9aa046f7
|
|
| MD5 |
f436ff8c46c6dde94bf54360d052a022
|
|
| BLAKE2b-256 |
014c1733f4ca43ef1439aeee726b28f2ec5b4392f50a7f0e30586f5bef5692d2
|
File details
Details for the file coeai-3.0.0-py3-none-any.whl.
File metadata
- Download URL: coeai-3.0.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
403c69c053533c6d1ac2291be4cd8565013866ba2dafd4f60f2e6808ea6288d0
|
|
| MD5 |
d35d92dd9d915aa6d98260eacd76eac2
|
|
| BLAKE2b-256 |
44021d704e5078d0c47b5b2b4cd320487e793283f8b9901960cc91f9be4f5a43
|