No project description provided
Project description
README.md
Introduction
This repository provides a template for deploying large language models (LLMs) using Docker Compose. The setup is designed to integrate multiple models with GPU support, Redis for data storage, and RabbitMQ for task queuing and processing. The provided main.py script demonstrates how to initialize and run a connection to process tasks using any model that inherits from the base model class.
Table of Contents
Docker Compose Setup
The provided docker-compose.yml template can be used to deploy your LLM model(s). It supports GPU execution and integration with a shared network (llm_wrap_network).
Example docker-compose.yml
version: '3.8'
services:
llm:
container_name: <your_container_name>
image: <your_image_name>:latest
runtime: nvidia
deploy:
resources:
limits:
memory: 100G
build:
context: ..
dockerfile: Dockerfile
env_file: .env
environment:
FORCE_CMAKE: 1
volumes:
- <your_path_to_data_in_docker>:/data
ports:
- "8677:8672"
networks:
- llm_wrap_network
restart: unless-stopped
networks:
llm_wrap_network:
name: llm_wrap_network
driver: bridge
Adding Multiple Models
You can define multiple models by duplicating the service block in docker-compose.yml and adjusting the relevant parameters (e.g., container name, ports, GPUs). For example:
services:
llm_1:
container_name: llm_model_1
image: llm_image:latest
runtime: nvidia
environment:
MODEL_PATH: /data/model_1
NVIDIA_VISIBLE_DEVICES: "GPU-1"
ports:
- "8677:8672"
llm_2:
container_name: llm_model_2
image: llm_image:latest
runtime: nvidia
environment:
MODEL_PATH: /data/model_2
NVIDIA_VISIBLE_DEVICES: "GPU-2"
ports:
- "8678:8672"
networks:
llm_wrap_network:
name: llm_wrap_network
driver: bridge
By assigning separate GPUs and ports, you can scale your infrastructure to serve multiple models simultaneously.
Main Script Overview
The provided main.py script demonstrates how to initialize and run the LLM wrapper (LLMWrap) with a selected model. The wrapper uses RabbitMQ for task queuing and Redis for result storage.
Key Components
-
Model Initialization:
llm_model = VllMModel(model_path=MODEL_PATH)
- Any model inheriting from
BaseLLMcan be used here. - Replace
VllMModelwith your custom model class if needed.
- Any model inheriting from
-
LLMWrap Initialization:
llm_wrap = LLMWrap( llm_model=llm_model, redis_host=REDIS_HOST, redis_port=REDIS_PORT, queue_name=QUEUE_NAME, rabbit_host=RABBIT_MQ_HOST, rabbit_port=RABBIT_MQ_PORT, rabbit_login=RABBIT_MQ_LOGIN, rabbit_password=RABBIT_MQ_PASSWORD, redis_prefix=REDIS_PREFIX )
- This connects the model to the task queue and result storage.
- Ensure the environment variables match the corresponding API configuration.
-
Starting the Connection:
llm_wrap.start_connection()
- Begins consuming tasks from RabbitMQ and processes them using the selected model.
- Results are saved to Redis.
Environment Variable Configuration
Environment variables are defined in the .env file and passed to Docker Compose. Below are the required variables:
TOKENS_LEN=16384
GPU_MEMORY_UTILISATION=0.9
TENSOR_PARALLEL_SIZE=2
MODEL_PATH=/data/<your_model_path>
NVIDIA_VISIBLE_DEVICES=<your_GPUs>
REDIS_HOST=localhost
REDIS_PORT=6379
QUEUE_NAME=<your_queue_name>
RABBIT_MQ_HOST=<rabbitmq_host>
RABBIT_MQ_PORT=<rabbitmq_port>
RABBIT_MQ_LOGIN=<rabbitmq_login>
RABBIT_MQ_PASSWORD=<rabbitmq_password>
REDIS_PREFIX=<redis_key_prefix>
Synchronization with API
If you are deploying an API in another container, ensure the following:
-
Environment Variables:
- Match the Redis and RabbitMQ configuration between the API and the LLM containers (e.g.,
REDIS_HOST,RABBIT_MQ_HOST).
- Match the Redis and RabbitMQ configuration between the API and the LLM containers (e.g.,
-
Network:
- Both containers must be on the same Docker network (
llm_wrap_networkin this example).
- Both containers must be on the same Docker network (
-
Shared Queues:
- The
QUEUE_NAMEvariable should be consistent across containers to ensure tasks are properly routed.
- The
Running the System
-
Setup Docker Compose:
- Adjust
docker-compose.ymland.envwith your specific configuration. - Start the system:
docker-compose up -d
- Adjust
-
Verify Running Containers:
- Check active containers:
docker ps
- Check active containers:
-
Monitor Logs:
- To view logs for a specific container:
docker logs -f <container_name>
- To view logs for a specific container:
-
Submit Tasks:
- Tasks can be submitted to the RabbitMQ queue, and results will be saved in Redis.
For more details, refer to the comments in docker-compose.yml and main.py. If you encounter any issues, feel free to open an issue in the repository!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protollm_worker-1.0.2.tar.gz.
File metadata
- Download URL: protollm_worker-1.0.2.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.11 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca4af36832818eb98aa84c548566229f53750789a679c3cd39c11ce309395f27
|
|
| MD5 |
330c01b6b5e55d242fd47577a704ffb5
|
|
| BLAKE2b-256 |
e90f3a9c4d1a1455696de7fe6c1a451b1e2c83f7aaa7e78e4df14b67cfd48db8
|
File details
Details for the file protollm_worker-1.0.2-py3-none-any.whl.
File metadata
- Download URL: protollm_worker-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.11 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e83b3e89955b898cebd0b30ba6eeaebbd777f3698c5cfcdaf35b202730bb85e
|
|
| MD5 |
7fc8459d768d8d8318267c25bf186587
|
|
| BLAKE2b-256 |
080f398db4321ee683eb0b93b2bf969c9a9454643992ca997d49d4572ee987f7
|