Skip to main content

No project description provided

Project description

LLM API Documentation

This API allows interaction with a distributed LLM architecture using RabbitMQ and Redis. Requests are processed asynchronously by a worker system (LLM-core) that generates responses and saves them to Redis. The API retrieves results from Redis and sends them back to the user.


Endpoints

/generate

  • Method: POST
  • Description: Sends a prompt for single message generation.
  • Request Body:
    {
      "job_id": "string",
      "meta": {
        "temperature": 0.2,
        "tokens_limit": 8096,
        "stop_words": [
          "string"
        ],
        "model": "string"
      },
      "content": "string"
    }
    
    • job_id (string): Unique identifier for the task.
    • meta (object): Metadata for generation:
      • temperature (float): The degree of randomness in generation (default 0.2).
      • tokens_limit (integer): Maximum tokens for the response (default 8096).
      • stop_words (list of strings): Words to stop generation.
      • model (string): Model to use for generation.
    • content (string): The input text for generation.
  • Response:
    {
      "content": "string"
    }
    
    • content (string): The generated text.

/chat_completion

  • Method: POST
  • Description: Sends a conversation history for chat-based completions.
  • Request Body:
    {
      "job_id": "string",
      "meta": {
        "temperature": 0.2,
        "tokens_limit": 8096,
        "stop_words": [
          "string"
        ],
        "model": "string"
      },
      "messages": [
        {
          "role": "string",
          "content": "string"
        }
      ]
    }
    
    • job_id (string): Unique identifier for the task.
    • meta (object): Metadata for chat completion:
      • temperature (float): The degree of randomness in responses (default 0.2).
      • tokens_limit (integer): Maximum tokens for the response (default 8096).
      • stop_words (list of strings): Words to stop the generation.
      • model (string): Model to use for chat completion.
    • messages (list of objects): Conversation history:
      • role (string): Role of the message sender ("user", "assistant", etc.).
      • content (string): Message content.
  • Response:
    {
      "content": "string"
    }
    
    • content (string): The generated response.

Environment Variables

These variables must be configured and synchronized with the LLM-core system:

RabbitMQ Configuration

  • RABBIT_MQ_HOST: RabbitMQ server hostname or IP.
  • RABBIT_MQ_PORT: RabbitMQ server port.
  • RABBIT_MQ_LOGIN: RabbitMQ login username.
  • RABBIT_MQ_PASSWORD: RabbitMQ login password.
  • QUEUE_NAME: Name of the RabbitMQ queue to process tasks.

Redis Configuration

  • REDIS_HOST: Redis server hostname or IP.
  • REDIS_PORT: Redis server port.
  • REDIS_PREFIX: Key prefix for task results in Redis.

Internal LLM-core Configuration

  • INNER_LLM_URL: URL for the LLM-core worker service.

Example .env File

INNER_LLM_URL=localhost:8670
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PREFIX=llm-api
RABBIT_MQ_HOST=localhost
RABBIT_MQ_PORT=5672
RABBIT_MQ_LOGIN=admin
RABBIT_MQ_PASSWORD=admin
QUEUE_NAME=llm-api-queue

System Architecture

Below is the architecture diagram for the interaction between API, RabbitMQ, LLM-core, and Redis:

+-------------------+       +-----------------+       +----------------+       +-------------------+
|                   |       |                 |       |                |       |                   |
|       API         +------>+    RabbitMQ     +------>+    LLM-core    +------>+      Redis         |
|                   |       |                 |       |                |       |                   |
+-------------------+       +-----------------+       +----------------+       +-------------------+
        ^                             ^                                ^
        |                             |                                |
        |      Requests are queued    |    Worker retrieves tasks     | Results are stored in Redis
        |      Results are polled     |                                |
        +-----------------------------+--------------------------------+

Flow

  1. API:

    • Receives requests via endpoints (/generate, /chat_completion).
    • Publishes tasks to RabbitMQ.
    • Polls Redis for results based on task IDs.
  2. RabbitMQ:

    • Acts as a queue for task distribution.
    • LLM-core workers subscribe to queues to process tasks.
  3. LLM-core:

    • Retrieves tasks from RabbitMQ.
    • Processes prompts or chat completions using LLM models.
    • Stores results in Redis.
  4. Redis:

    • Acts as the result storage.
    • API retrieves results from Redis when tasks are completed.

Usage

Running the API

  1. Configure environment variables in the .env file.
  2. Start the API using:
     app = FastAPI()
    
     config = Config.read_from_env()
    
     app.include_router(get_router(config))
    

Example Request

Generate

curl -X POST "http://localhost:8000/generate" -H "Content-Type: application/json" -d '{
  "job_id": "12345",
  "meta": {
    "temperature": 0.5,
    "tokens_limit": 1000,
    "stop_words": ["stop"],
    "model": "gpt-model"
  },
  "content": "What is AI?"
}'

Chat Completion

curl -X POST "http://localhost:8000/chat_completion" -H "Content-Type: application/json" -d '{
  "job_id": "12345",
  "meta": {
    "temperature": 0.5,
    "tokens_limit": 1000,
    "stop_words": ["stop"],
    "model": "gpt-model"
  },
  "messages": [
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "Artificial Intelligence is..."}
  ]
}'

Notes

  • Ensure that RABBIT_MQ_HOST, RABBIT_MQ_PORT, REDIS_HOST, and other variables are synchronized between the API and LLM-core containers.
  • The system supports distributed scaling by adding more LLM-core workers to the RabbitMQ queue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protollm_api-1.0.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protollm_api-1.0.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file protollm_api-1.0.0.tar.gz.

File metadata

  • Download URL: protollm_api-1.0.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.11 Windows/10

File hashes

Hashes for protollm_api-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a75d17aca2f5f706b66ce48b8c311e2c5b88eb32a2fe38c27a8bfdfb76a3b12a
MD5 496c63e1c358ecd43fd2bc2db8266118
BLAKE2b-256 44fef24d5ac47f45334814599aa3bde8cb04085ab0eda014e02d7054d424fa5b

See more details on using hashes here.

File details

Details for the file protollm_api-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: protollm_api-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.11 Windows/10

File hashes

Hashes for protollm_api-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 27ecc3e6ca14af66cab2e295ba2b3965644bfc3b4e087bc9474c3ba510e9be7d
MD5 af81291a141448ce1ad56a3617b4c4eb
BLAKE2b-256 3b04ec441701f53294ffcf151974c80073e6d6b525cdcd630e3ef2316847bbe3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page