Skip to main content

This repository contains a docker-compose file that can be used to run a Ray cluster on a single machine

Project description

DiLLeMa

DiLLeMa is a distributed Large Language Model (LLM) that can be used to generate text. It is built on top of Ray Framework and VLLM. The purpose of this project is to provide a easy-to-use interface for users to deploy and use LLMs in a distributed setting.

Architectural

Installation

pip install dillema

Project Structure

/dillema
│
├── api_gateway/                # API Layer (FastAPI)
│   ├── __init__.py
│   ├── main.py                 # Entry point untuk API
│   ├── endpoints.py            # Definisi endpoint API
│   └── utils.py                # Utility functions (e.g., request validation)
│
├── ray_cluster/                # Ray cluster manager & task scheduler
│   ├── __init__.py
│   ├── ray_manager.py          # Manajer cluster Ray
│   ├── task_scheduler.py       # Pembagian tugas ke worker
│   └── worker_manager.py       # Menangani pengelolaan worker Ray
│
├── workers/                    # Worker nodes yang menjalankan LLM inferensi
│   ├── __init__.py
│   ├── worker.py               # Kode untuk setiap worker (Actor Ray)
│   ├── preprocessing.py        # Preprocessing data sebelum inferensi
│   ├── llm_inference.py        # Kode untuk melakukan inferensi LLM
│   └── postprocessing.py       # Postprocessing hasil inferensi
│
├── models/                     # Model LLM dan penyimpanan
│   ├── __init__.py
│   ├── model_loader.py         # Mengelola pemuatan model
│   ├── model_storage.py        # Mengatur akses ke penyimpanan model (misal S3)
│   └── model_config.py         # Konfigurasi model yang digunakan
│
├── vllm/                       # Implementasi VLLM untuk optimisasi
│   ├── __init__.py
│   ├── vllm_batching.py        # Optimasi batching menggunakan VLLM
│   └── vllm_inference.py       # Integrasi VLLM untuk inference
│
├── tests/                      # Unit test dan integration test
│   ├── __init__.py
│   ├── test_api.py             # Test API Gateway
│   ├── test_ray.py             # Test distribusi task ke worker
│   └── test_inference.py       # Test inferensi LLM dan optimisasi VLLM
│
├── requirements.txt            # Dependensi library (Ray, VLLM, FastAPI, dll)
├── Dockerfile                  # Dockerfile untuk deployment
└── README.md                   # Dokumentasi proyek

Flow Diagram

  +------------------------+
  |    Pengguna (User)     |
  +------------------------+
            |
            v
  +------------------------+     +------------------------+
  |    API Server (FastAPI) |<--->|   Ray Worker (Client)  |
  +------------------------+     +------------------------+
            |                         ^
            v                         |
    +--------------------+    +--------------------+
    |  Head Node Ray     |----|  Ray Cluster      |
    |  (Ray Management)  |    | (Worker Nodes)    |
    +--------------------+    +--------------------+
            |
            v
  +------------------------+
  |  Model Loading         |
  |  (LLM Model)           |
  +------------------------+

Usage

PRE-REQUISITES

  1. For your safety you must to install anaconda and run the following script.
conda create -n dillema
conda activate dillema

conda install python=3.12.9
  1. Run the Head Node: The user first runs the head node to start the Ray cluster.
python -m dillema.ray_cluster.head_node
  1. Run the Client Node: After that, the user runs the client node to connect the worker to the head node.
python -m dillema.ray_cluster.client_node --head-node-ip <head-node-ip>

SERVE YOUR OWN LLM MODEL

  1. Run the API Server: Finally, the user runs the API server to start model serving and receive inference requests.
python -m dillema.cli serve --model "meta/llma-" --port 8000 --head-node-ip <head-node-ip>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dillema-0.1.5.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dillema-0.1.5-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file dillema-0.1.5.tar.gz.

File metadata

  • Download URL: dillema-0.1.5.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dillema-0.1.5.tar.gz
Algorithm Hash digest
SHA256 92536d7bbe4002b28664d5f6bd8aca50939e43f9b2c3a8146de400dcdf35730e
MD5 4bcd9147e4c0d19d7fc250f1a113ae42
BLAKE2b-256 2012277cfac9444fc95b888fe2d475f838b301e4a2ca85c9b9616c34d304b44e

See more details on using hashes here.

File details

Details for the file dillema-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: dillema-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dillema-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0c3f548c45243ea5dbac4b1137b3b460cb1fe62e80bb948358665c35a46b878e
MD5 eb01f8c47f8fa164fe72583a76c201dd
BLAKE2b-256 385acd38dc868ed0d5fc9085e3d31d44e8eff4dbd5e8be2b70e6cd2336efefdb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page