No project description provided
Project description
Adaptive GPU Allocator
In the following instruction, the name of the directory where the files related to AGA are installed is assumed as $WORKDIR.
Install
Install AGA
- Prepare your python virtual environment you like and activate it.
- Run
cd $WORKDIR && git clone https://github.labs.fujitsu.com/shcpj/adaptive-gpu-allocator.gitto clone the AGA repository. - Run
cd $WORKDIR/adaptive-gpu-allocatorto change the working directory. - Run
pip install -e .to install AGA.
Launch a GPU assigner
GPU assiger is automatically installed with the adaptive_gpu_allocator package.
You have to launch a GPU assigner server before running your application program with AGA enabled.
You can start or stop a GPU assigner server using the gpu_assigner CLI.
gpu_assigner start # Start GPU assigner server
gpu_assigner status # Show current GPU assigner status
agarun python <your_program>.py # Run your program with AGA enabled.
...
gpu_assigner stop # Stop GPU assigner
Usage
Single run
The following example shows how to run the sample program contained in this repository.
- Run
cd $WORKDIR/adaptive-gpu-allocator/samples - Run
agarun python sample.py
The output looks like below.
Epoch 0, Loss: 1.0316227674484253 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 1, Loss: 1.0306103229522705 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 2, Loss: 1.0296107530593872 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Run multiple programs
In order to see whether AGA is working, run the sample program with a greater number of concurrent processes than the number of GPUs.
When you run 3 processes with 2 GPUs, the output looks like below.
1st process
Epoch 0, Loss: 1.0316227674484253 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 1, Loss: 1.0306103229522705 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 2, Loss: 1.0296107530593872 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
2nd process
Epoch 0, Loss: 1.0316227674484253 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 1, Loss: 1.0306103229522705 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 2, Loss: 1.0296107530593872 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 3, Loss: 1.0286246538162231 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 4, Loss: 1.0276520252227783 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 5, Loss: 1.02669358253479 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 6, Loss: 1.0257498025894165 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 7, Loss: 1.0248205661773682 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 8, Loss: 1.0239065885543823 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 9, Loss: 1.023008108139038 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 10, Loss: 1.022125244140625 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 11, Loss: 1.0212583541870117 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 12, Loss: 1.020407795906067 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 13, Loss: 1.01957368850708 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 14, Loss: 1.0187562704086304 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 15, Loss: 1.0179557800292969 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 16, Loss: 1.0171724557876587 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 17, Loss: 1.0164062976837158 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 18, Loss: 1.015657663345337 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 19, Loss: 1.014926552772522 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
3rd process
Epoch 0, Loss: 1.0316228866577148 on CPU
Epoch 1, Loss: 1.0306103229522705 on CPU
Epoch 2, Loss: 1.0296107530593872 on CPU
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
The first and second processes run on each of the two GPUs and the third process run on CPU because of the lack of GPUs at the beginning.
Once the first or second process finishs, a GPU becomes available and then the third process runs on the GPU.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_computing_broker-1.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: ai_computing_broker-1.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 12.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
636c1ba08a576eaae121022a8d055dcc9600f9452d0b8b2563e9e0a5896c5ba7
|
|
| MD5 |
303b93cfeb3f34f64cc9b4f71374e228
|
|
| BLAKE2b-256 |
492a13556e361e221dd7b4053d818b0b681353b24be86242eff6b4070ff4c085
|
File details
Details for the file ai_computing_broker-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: ai_computing_broker-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 12.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5fe34df1221c5a42ee30210a56ffe3a21aac6f0acde5ae8a467f5bdb0753777
|
|
| MD5 |
3db16e049b09107c47fd5456508a2140
|
|
| BLAKE2b-256 |
874a54e0868e501fcae051d5baea1bf521ca50d4cb8249c89d1a8fe25d0abedc
|
File details
Details for the file ai_computing_broker-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: ai_computing_broker-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 11.7 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34d10f50b6a72b756929f12180b79cf29b176020f09c20788e74f3bdcdd04a74
|
|
| MD5 |
ed5ef1131e4211b1cc841005ff7f8270
|
|
| BLAKE2b-256 |
9a29bdcfbaa3c4fa4cc2a308ae3e70f17f199d18da715b828cd2d4112c254666
|
File details
Details for the file ai_computing_broker-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: ai_computing_broker-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 11.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3272f0f5233d2339dda6204647863861a7200a279839b2282e9b1e00d6b20741
|
|
| MD5 |
082ac4d3587126286bc0ac0d76671f61
|
|
| BLAKE2b-256 |
ca73fe3517d800601d2d01d33b4e3b7b9a742ea4521b9365bff653e8406d122f
|