Skip to main content

No project description provided

Project description

README.md

Introduction

This repository provides a template for deploying large language models (LLMs) using Docker Compose. The setup is designed to integrate multiple models with GPU support, Redis for data storage, and RabbitMQ for task queuing and processing. The provided main.py script demonstrates how to initialize and run a connection to process tasks using any model that inherits from the base model class.


Table of Contents

  1. Docker Compose Setup
  2. Main Script Overview
  3. Environment Variable Configuration

Docker Compose Setup

The provided docker-compose.yml template can be used to deploy your LLM model(s). It supports GPU execution and integration with a shared network (llm_wrap_network).

Example docker-compose.yml

version: '3.8'

services:
   llm:
      container_name: <your_container_name>
      image: <your_image_name>:latest
      runtime: nvidia
      deploy:
         resources:
            limits:
               memory: 100G
      build:
         context: ..
         dockerfile: Dockerfile
      env_file: .env
      environment:
         FORCE_CMAKE: 1
      volumes:
         - <your_path_to_data_in_docker>:/data
      ports:
         - "8677:8672"
      networks:
         - llm_wrap_network
      restart: unless-stopped

networks:
   llm_wrap_network:
      name: llm_wrap_network
      driver: bridge

Adding Multiple Models

You can define multiple models by duplicating the service block in docker-compose.yml and adjusting the relevant parameters (e.g., container name, ports, GPUs). For example:

services:
  llm_1:
    container_name: llm_model_1
    image: llm_image:latest
    runtime: nvidia
    environment:
      MODEL_PATH: /data/model_1
      NVIDIA_VISIBLE_DEVICES: "GPU-1"
    ports:
      - "8677:8672"

  llm_2:
    container_name: llm_model_2
    image: llm_image:latest
    runtime: nvidia
    environment:
      MODEL_PATH: /data/model_2
      NVIDIA_VISIBLE_DEVICES: "GPU-2"
    ports:
      - "8678:8672"

networks:
  llm_wrap_network:
    name: llm_wrap_network
    driver: bridge

By assigning separate GPUs and ports, you can scale your infrastructure to serve multiple models simultaneously.


Main Script Overview

The provided main.py script demonstrates how to initialize and run the LLM wrapper (LLMWrap) with a selected model. The wrapper uses RabbitMQ for task queuing and Redis for result storage.

Key Components

  1. Model Initialization:

    llm_model = VllMModel(model_path=MODEL_PATH)
    
    • Any model inheriting from BaseLLM can be used here.
    • Replace VllMModel with your custom model class if needed.
  2. LLMWrap Initialization:

    llm_wrap = LLMWrap(
        llm_model=llm_model,
        redis_host=REDIS_HOST,
        redis_port=REDIS_PORT,
        queue_name=QUEUE_NAME,
        rabbit_host=RABBIT_MQ_HOST,
        rabbit_port=RABBIT_MQ_PORT,
        rabbit_login=RABBIT_MQ_LOGIN,
        rabbit_password=RABBIT_MQ_PASSWORD,
        redis_prefix=REDIS_PREFIX
    )
    
    • This connects the model to the task queue and result storage.
    • Ensure the environment variables match the corresponding API configuration.
  3. Starting the Connection:

    llm_wrap.start_connection()
    
    • Begins consuming tasks from RabbitMQ and processes them using the selected model.
    • Results are saved to Redis.

Environment Variable Configuration

Environment variables are defined in the .env file and passed to Docker Compose. Below are the required variables:

TOKENS_LEN=16384
GPU_MEMORY_UTILISATION=0.9
TENSOR_PARALLEL_SIZE=2
MODEL_PATH=/data/<your_model_path>
NVIDIA_VISIBLE_DEVICES=<your_GPUs>
REDIS_HOST=localhost
REDIS_PORT=6379
QUEUE_NAME=<your_queue_name>
RABBIT_MQ_HOST=<rabbitmq_host>
RABBIT_MQ_PORT=<rabbitmq_port>
RABBIT_MQ_LOGIN=<rabbitmq_login>
RABBIT_MQ_PASSWORD=<rabbitmq_password>
REDIS_PREFIX=<redis_key_prefix>

Synchronization with API

If you are deploying an API in another container, ensure the following:

  1. Environment Variables:

    • Match the Redis and RabbitMQ configuration between the API and the LLM containers (e.g., REDIS_HOST, RABBIT_MQ_HOST).
  2. Network:

    • Both containers must be on the same Docker network (llm_wrap_network in this example).
  3. Shared Queues:

    • The QUEUE_NAME variable should be consistent across containers to ensure tasks are properly routed.

Running the System

  1. Setup Docker Compose:

    • Adjust docker-compose.yml and .env with your specific configuration.
    • Start the system:
      docker-compose up -d
      
  2. Verify Running Containers:

    • Check active containers:
      docker ps
      
  3. Monitor Logs:

    • To view logs for a specific container:
      docker logs -f <container_name>
      
  4. Submit Tasks:

    • Tasks can be submitted to the RabbitMQ queue, and results will be saved in Redis.

For more details, refer to the comments in docker-compose.yml and main.py. If you encounter any issues, feel free to open an issue in the repository!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protollm_worker-1.0.3.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protollm_worker-1.0.3-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file protollm_worker-1.0.3.tar.gz.

File metadata

  • Download URL: protollm_worker-1.0.3.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.11 Windows/10

File hashes

Hashes for protollm_worker-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f45f964100664b03aed1e6af6144e739641cfd5abda149acc10236702399afa2
MD5 d8b0114cd48a74549de2da9d2f2969eb
BLAKE2b-256 8fb9ae279cb2c546a861f1a80dbcdc89848dbfded9e97fbc707ead0c2b686e68

See more details on using hashes here.

File details

Details for the file protollm_worker-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: protollm_worker-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.10.11 Windows/10

File hashes

Hashes for protollm_worker-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 80ab5a4bbe4ef3312f33b1983a12d0406ce09673301acf2d20f4d37df3fd5a2e
MD5 42601d3da9fb9795f184d0693c02f492
BLAKE2b-256 49ae1dd508b1a406b9c35e30bcd32b1a8590970b24f0997acde6454555ee5a26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page