This repository contains a docker-compose file that can be used to run a Ray cluster on a single machine
Project description
DiLLeMa
DiLLeMa is a distributed Large Language Model (LLM) that can be used to generate text. It is built on top of Ray Framework and VLLM. The purpose of this project is to provide a easy-to-use interface for users to deploy and use LLMs in a distributed setting.
Installation
pip install dillema
Project Structure
/dillema
│
├── api_gateway/ # API Layer (FastAPI)
│ ├── __init__.py
│ ├── main.py # Entry point untuk API
│ ├── endpoints.py # Definisi endpoint API
│ └── utils.py # Utility functions (e.g., request validation)
│
├── ray_cluster/ # Ray cluster manager & task scheduler
│ ├── __init__.py
│ ├── ray_manager.py # Manajer cluster Ray
│ ├── task_scheduler.py # Pembagian tugas ke worker
│ └── worker_manager.py # Menangani pengelolaan worker Ray
│
├── workers/ # Worker nodes yang menjalankan LLM inferensi
│ ├── __init__.py
│ ├── worker.py # Kode untuk setiap worker (Actor Ray)
│ ├── preprocessing.py # Preprocessing data sebelum inferensi
│ ├── llm_inference.py # Kode untuk melakukan inferensi LLM
│ └── postprocessing.py # Postprocessing hasil inferensi
│
├── models/ # Model LLM dan penyimpanan
│ ├── __init__.py
│ ├── model_loader.py # Mengelola pemuatan model
│ ├── model_storage.py # Mengatur akses ke penyimpanan model (misal S3)
│ └── model_config.py # Konfigurasi model yang digunakan
│
├── vllm/ # Implementasi VLLM untuk optimisasi
│ ├── __init__.py
│ ├── vllm_batching.py # Optimasi batching menggunakan VLLM
│ └── vllm_inference.py # Integrasi VLLM untuk inference
│
├── tests/ # Unit test dan integration test
│ ├── __init__.py
│ ├── test_api.py # Test API Gateway
│ ├── test_ray.py # Test distribusi task ke worker
│ └── test_inference.py # Test inferensi LLM dan optimisasi VLLM
│
├── requirements.txt # Dependensi library (Ray, VLLM, FastAPI, dll)
├── Dockerfile # Dockerfile untuk deployment
└── README.md # Dokumentasi proyek
Flow Diagram
+------------------------+
| Pengguna (User) |
+------------------------+
|
v
+------------------------+ +------------------------+
| API Server (FastAPI) |<--->| Ray Worker (Client) |
+------------------------+ +------------------------+
| ^
v |
+--------------------+ +--------------------+
| Head Node Ray |----| Ray Cluster |
| (Ray Management) | | (Worker Nodes) |
+--------------------+ +--------------------+
|
v
+------------------------+
| Model Loading |
| (LLM Model) |
+------------------------+
Usage
PRE-REQUISITES
- For your safety you must to install anaconda and run the following script.
conda create -n dillema
conda activate dillema
conda install python=3.12.9
- Run the Head Node: The user first runs the head node to start the Ray cluster.
python -m dillema.ray_cluster.head_node --port 6379
- Run the Client Node: After that, the user runs the client node to connect the worker to the head node.
python -m dillema.ray_cluster.client_node --head-node-ip <head-node-ip> --port 6379
SERVE YOUR OWN LLM MODEL
- Run the API Server: Finally, the user runs the API server to start model serving and receive inference requests.
python -m dillema.cli serve --model "meta/llma-" --port 8000 --head-node-ip <head-node-ip>
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dillema-0.1.2.tar.gz.
File metadata
- Download URL: dillema-0.1.2.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
029116454ca87369dd8b015e86c30913dceb8bad9157d31d53f7b5f789f3b673
|
|
| MD5 |
6cd61fc3d160349441db666ddf24a22e
|
|
| BLAKE2b-256 |
2d180f4f1e1ff5fb0137fa6872fabfb454b9a9914407744918abc5aeeafcbac7
|
File details
Details for the file dillema-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dillema-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
875d886bbda4f14a414a4701ae9b6411e50bed72f6b488f17c94cd16d5adb0f7
|
|
| MD5 |
4601f6aee539dd704000819200421f6f
|
|
| BLAKE2b-256 |
8f33cd7d6a9b9d32f819b20ba6dddc2da15abbcc568c0b7b059ca10c2f30a771
|