Skip to main content

Client to interact with COE AI-hosted LLM models

Project description

coeai PyPI Package

Interact with high-capacity multimodal LLMs hosted on the COE AI GPU cluster from any Python environment.

coeai is a lightweight Python wrapper currently around the LLaMA-4 16x17B model (128K context, vision-enabled) deployed on the Centre of Excellence for AI (COE AI) servers at UPES. It exposes a single /generate HTTP endpoint, making it trivial to run both text-only and image+text inference from notebooks, scripts or backend services connected to the UPES Wi-Fi.

Text and image input 128,000-token context Streaming or batch Runs on the COE AI GPU node


Table of Contents

  1. Features
  2. Requirements
  3. Installation
  4. Quick Start
  5. API Usage
  6. Model Parameters
  7. Authentication
  8. Joining COE AI
  9. Troubleshooting
  10. License
  11. Author

Features

  • Ultra-long context up to 128K tokens per request for long documents or multi-turn chats
  • Vision support send images along with text for multimodal reasoning
  • High performance queries are served by a dedicated GPU node inside the COE AI HPC cluster
  • Simple auth authenticate with a short-lived API key (valid 30 days) sent in the request header
  • Drop-in wrapper minimal Python API; no need to handle HTTP manually

Requirements

  • Python 3.8 or newer
  • Network access to http://10.16.1.50:8000 from the UPES campus Wi-Fi
  • A valid API key issued by the COE AI team

Installation

pip install coeai

This pulls the latest release from PyPI.


Quick Start

The wrapper exposes a single LLMinfer class. Initialize it with the API URL and your API key, then call infer().

Text-to-Text

from coeai import LLMinfer

llm = LLMinfer(
    api_url="http://10.16.1.50:8000/generate",
    api_key="API_KEY"
)

response = llm.infer(
    mode="text-to-text",
    prompt_text="Summarize the key points of general relativity.",
    max_tokens=500,
    temperature=0.6,
    top_p=0.95,
    stream=False
)

print(response)

Image + Text

from coeai import LLMinfer

# Initialize the client
llm = LLMinfer(
    api_url="http://10.16.1.50:8000/generate",
    api_key="API_KEY"
)

# Run inference with image and prompt
response = llm.infer(
    mode="image-text-to-text",
    prompt_text="Describe what's happening in the image.",
    image_path="/home/konal.106904/sample.jpg",  # <-- update to a valid path
    max_tokens=512,
    temperature=0.7,
    top_p=1.0,
    stream=False
)

# Print the response
print(response)

API Usage

Using the Python Wrapper

The examples above show the recommended approach using the LLMinfer class.

Direct API Access with cURL

You can also interact directly with the /generate endpoint using cURL.

Prerequisites

Requirement Purpose
A running instance of the API Default URL: http://10.16.1.50:8000/generate
Valid API key Supply in the X-API-Key request header
cURL 7.68+ Supports --data @- JSON piping

Text-Only Request

curl -X POST http://10.16.1.50:8000/generate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY_HERE" \
  -d '{
    "model": "llama4",
    "messages": [
      {
        "role": "system",
        "content": "This is a chat between a user and an assistant. The assistant is helping the user with general questions."
      },
      {
        "role": "user",
        "content": "Explain what a black hole is."
      }
    ],
    "max_tokens": 512,
    "temperature": 0.7,
    "top_p": 1.0,
    "stream": false
  }'

Image + Text Request

For multimodal requests, include the image as a Base64-encoded Data URI in the content array:

Note: Replace YOUR_API_KEY_HERE with your own API key and PUT_BASE64_IMAGE_STRING_HERE with the Base64-encoded contents of your image file.

How it Works:

  1. Inline Image: The image_url object embeds the entire image as a Data URI so no separate file upload is required
  2. Multi-Modal Prompt: The content field is an array containing both the image and the accompanying text question, preserving ordering
  3. Response: The server returns a JSON object containing the assistant's interpretation of the supplied image

Model Parameters

Default Parameters

Field Description Default
model Model name (currently fixed to llama4) llama4
stream Return tokens incrementally false
max_tokens Maximum new tokens to generate 1024
temperature Sampling temperature (creativity) 0.7
top_p Nucleus sampling 1.0
stop List of stop sequences null

Parameter Details

Parameter Description
model The model identifier exposed by your server (here llama4)
messages Conversation history, each entry containing a role and content
max_tokens Upper bound on tokens in the assistant reply
temperature Controls randomness; lower values yield more deterministic output
top_p Nucleus sampling; keep at 1.0 for default behavior
stream When true, the API will send incremental responses via Server-Sent Events (SSE)

Note: The server enforces total context of 128K tokens (prompt + generated). Adjust max_tokens accordingly.


Authentication

All requests must include an API key issued by the COE AI team. Pass the key when constructing LLMinfer (it is added as an Authorization header behind the scenes).

Requesting an API Key

  1. Send an email to hpc-access@ddn.upes.ac.in from your official UPES account using this template:
Subject: API Key Request for COE AI LLM Access

Dear COE AI Team,

I am requesting access to the LLM API for my project work.

Project Details:
- Project Name: <Your Project Name>
- Project Description: <Brief description>
- Expected Usage: <How you plan to use the LLM>
- Duration: <Timeline>

Reason for API Access:
<Research objectives or academic requirements>

Additional Information:
- Name: <Your Name>
- Email: <Your Email>
- Department/Affiliation: <Dept/Organisation>
- Student/Faculty ID: <If applicable>

Thank you for considering my request.

Best regards,
<Your Name>
  1. Allow 2-3 business days for processing. The team will reply with your API key.

Key Renewal

Keys expire after 30 days. Email the same address with the subject:

Subject: API Key Renewal Request for COE AI LLM Access

Include your previous key and a brief usage summary.


Troubleshooting

Symptom Possible Cause Fix
ConnectionError Not on UPES network Connect to campus Wi-Fi or VPN
401 Unauthorized Missing/expired API key Request or renew your key
Long latency Very large prompts or high max_tokens Reduce prompt size or output length

License

coeai is released under the MIT License.


Author

Konal Puri
Centre of Excellence: AI (COE AI), HPC Project, UPES

PyPI: https://pypi.org/project/coeai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coeai-1.1.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coeai-1.1.1-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file coeai-1.1.1.tar.gz.

File metadata

  • Download URL: coeai-1.1.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-1.1.1.tar.gz
Algorithm Hash digest
SHA256 fa08a57cba0f1ed700fdf1f849ac5b4f1b0547e9f027c319be2ec35a1acd71fe
MD5 218cf3e854501d1e8a67e93c81ecaba7
BLAKE2b-256 9a21405c23d6219f534c12720ed0a9979e6d1c2466a3bfe0d9ec7f12009d4357

See more details on using hashes here.

File details

Details for the file coeai-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: coeai-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for coeai-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2d3cc2d2ce525393c275ef3010d7139d8ebe3e9c9d1ba7f5ce0942967eb77b8
MD5 cf32a518992e9163f4fc74192cf5de6b
BLAKE2b-256 4c03d02f06a916caf5a8bc7fb826a619422242f18fd64f055fe5483629e17aa4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page