Client to interact with COE AI-hosted LLM models
Project description
coeai PyPI Package
Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.
coeai is a lightweight Python wrapper currently around the LLaMA-4 16x17B model (128K context, vision-enabled) deployed on the Centre of Excellence for AI (COE AI) servers at UPES. It exposes a single /generate HTTP endpoint, making it trivial to run both text-only and image+text inference from notebooks, scripts or backend services connected to the UPES Wi-Fi.
Text and image input 128,000-token context Streaming or batch Runs on the COE AI GPU node
Table of Contents
- Features
- Requirements
- Installation
- Quick Start
- API Usage
- Model Parameters
- Authentication
- Joining COE AI
- Troubleshooting
- License
- Author
Features
- Ultra-long context up to 128K tokens per request for long documents or multi-turn chats
- Vision support send images along with text for multimodal reasoning
- High performance queries are served by a dedicated GPU node inside the COE AI HPC cluster
- Simple auth authenticate with a short-lived API key (valid 30 days) sent in the request header
- Drop-in wrapper minimal Python API; no need to handle HTTP manually
Requirements
- Python 3.8 or newer
- Network access to
http://10.16.1.50:8000from the UPES campus Wi-Fi - A valid API key issued by the COE AI team
Installation
pip install coeai
This pulls the latest release from PyPI.
Quick Start
The wrapper exposes a single LLMinfer class. Initialize it with the API URL and your API key, then call infer().
Text-to-Text
from coeai import LLMinfer
llm = LLMinfer(
api_url="http://10.16.1.50:8000/generate",
api_key="API_KEY"
)
response = llm.infer(
mode="text-to-text",
prompt_text="Summarize the key points of general relativity.",
max_tokens=500,
temperature=0.6,
top_p=0.95,
stream=False
)
print(response)
Image + Text
from coeai import LLMinfer
# Initialize the client
llm = LLMinfer(
api_url="http://10.16.1.50:8000/generate",
api_key="API_KEY"
)
# Run inference with image and prompt
response = llm.infer(
mode="image-text-to-text",
prompt_text="Describe what's happening in the image.",
image_path="/home/konal.106904/sample.jpg", # <-- update to a valid path
max_tokens=512,
temperature=0.7,
top_p=1.0,
stream=False
)
# Print the response
print(response)
API Usage
Using the Python Wrapper
The examples above show the recommended approach using the LLMinfer class.
Direct API Access with cURL
You can also interact directly with the /generate endpoint using cURL.
Prerequisites
| Requirement | Purpose |
|---|---|
| A running instance of the API | Default URL: http://10.16.1.50:8000/generate |
| Valid API key | Supply in the X-API-Key request header |
| cURL 7.68+ | Supports --data @- JSON piping |
Text-Only Request
curl -X POST http://10.16.1.50:8000/generate \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY_HERE" \
-d '{
"model": "llama4",
"messages": [
{
"role": "system",
"content": "This is a chat between a user and an assistant. The assistant is helping the user with general questions."
},
{
"role": "user",
"content": "Explain what a black hole is."
}
],
"max_tokens": 512,
"temperature": 0.7,
"top_p": 1.0,
"stream": false
}'
Image + Text Request
For multimodal requests, include the image as a Base64-encoded Data URI in the content array:
Note: Replace
YOUR_API_KEY_HEREwith your own API key andPUT_BASE64_IMAGE_STRING_HEREwith the Base64-encoded contents of your image file.
How it Works:
- Inline Image: The
image_urlobject embeds the entire image as a Data URI so no separate file upload is required - Multi-Modal Prompt: The
contentfield is an array containing both the image and the accompanying text question, preserving ordering - Response: The server returns a JSON object containing the assistant's interpretation of the supplied image
Model Parameters
Default Parameters
| Field | Description | Default |
|---|---|---|
model |
Model name (currently fixed to llama4) |
llama4 |
stream |
Return tokens incrementally | false |
max_tokens |
Maximum new tokens to generate | 1024 |
temperature |
Sampling temperature (creativity) | 0.7 |
top_p |
Nucleus sampling | 1.0 |
stop |
List of stop sequences | null |
Parameter Details
| Parameter | Description |
|---|---|
model |
The model identifier exposed by your server (here llama4) |
messages |
Conversation history, each entry containing a role and content |
max_tokens |
Upper bound on tokens in the assistant reply |
temperature |
Controls randomness; lower values yield more deterministic output |
top_p |
Nucleus sampling; keep at 1.0 for default behavior |
stream |
When true, the API will send incremental responses via Server-Sent Events (SSE) |
Note: The server enforces total context of 128K tokens (prompt + generated). Adjust
max_tokensaccordingly.
Authentication
All requests must include an API key issued by the COE AI team. Pass the key when constructing LLMinfer (it is added as an Authorization header behind the scenes).
Requesting an API Key
- Send an email to
hpc-access@ddn.upes.ac.infrom your official UPES account using this template:
Subject: API Key Request for COE AI LLM Access
Dear COE AI Team,
I am requesting access to the LLM API for my project work.
Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>
Reason for API Access:
<Research objectives or academic requirements>
Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>
Thank you for considering my request.
Best regards,
<Your Name>
- Allow 2-3 business days for processing. The team will reply with your API key.
Key Renewal
Keys expire after 30 days. Email the same address with the subject:
Subject: API Key Renewal Request for COE AI LLM Access
Include your previous key and a brief usage summary.
Troubleshooting
| Symptom | Possible Cause | Fix |
|---|---|---|
ConnectionError |
Not on UPES network | Connect to campus Wi-Fi or VPN |
401 Unauthorized |
Missing/expired API key | Request or renew your key |
| Long latency | Very large prompts or high max_tokens |
Reduce prompt size or output length |
License
coeai is released under the MIT License.
Author
Konal Puri
Centre of Excellence: AI (COE AI), HPC Project, UPES
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coeai-1.1.1.tar.gz.
File metadata
- Download URL: coeai-1.1.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa08a57cba0f1ed700fdf1f849ac5b4f1b0547e9f027c319be2ec35a1acd71fe
|
|
| MD5 |
218cf3e854501d1e8a67e93c81ecaba7
|
|
| BLAKE2b-256 |
9a21405c23d6219f534c12720ed0a9979e6d1c2466a3bfe0d9ec7f12009d4357
|
File details
Details for the file coeai-1.1.1-py3-none-any.whl.
File metadata
- Download URL: coeai-1.1.1-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2d3cc2d2ce525393c275ef3010d7139d8ebe3e9c9d1ba7f5ce0942967eb77b8
|
|
| MD5 |
cf32a518992e9163f4fc74192cf5de6b
|
|
| BLAKE2b-256 |
4c03d02f06a916caf5a8bc7fb826a619422242f18fd64f055fe5483629e17aa4
|