Skip to main content

No project description provided

Project description

Adaptive GPU Allocator

In the following instruction, the name of the directory where the files related to AGA are installed is assumed as $WORKDIR.

Install

Install AGA

  1. Prepare your python virtual environment you like and activate it.
  2. Run cd $WORKDIR && git clone https://github.labs.fujitsu.com/shcpj/adaptive-gpu-allocator.git to clone the AGA repository.
  3. Run cd $WORKDIR/adaptive-gpu-allocator to change the working directory.
  4. Run pip install -e . to install AGA.

Launch a GPU assigner

GPU assiger is automatically installed with the adaptive_gpu_allocator package. You have to launch a GPU assigner server before running your application program with AGA enabled.

You can start or stop a GPU assigner server using the gpu_assigner CLI.

gpu_assigner start               # Start GPU assigner server
gpu_assigner status              # Show current GPU assigner status
agarun python <your_program>.py  # Run your program with AGA enabled.
...
gpu_assigner stop                # Stop GPU assigner

Usage

Single run

The following example shows how to run the sample program contained in this repository.

  1. Run cd $WORKDIR/adaptive-gpu-allocator/samples
  2. Run agarun python sample.py

The output looks like below.

Epoch 0, Loss: 1.0316227674484253 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 1, Loss: 1.0306103229522705 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 2, Loss: 1.0296107530593872 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f

Run multiple programs

In order to see whether AGA is working, run the sample program with a greater number of concurrent processes than the number of GPUs.

When you run 3 processes with 2 GPUs, the output looks like below.

1st process

Epoch 0, Loss: 1.0316227674484253 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 1, Loss: 1.0306103229522705 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 2, Loss: 1.0296107530593872 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f

2nd process

Epoch 0, Loss: 1.0316227674484253 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 1, Loss: 1.0306103229522705 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 2, Loss: 1.0296107530593872 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 3, Loss: 1.0286246538162231 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 4, Loss: 1.0276520252227783 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 5, Loss: 1.02669358253479 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 6, Loss: 1.0257498025894165 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 7, Loss: 1.0248205661773682 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 8, Loss: 1.0239065885543823 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 9, Loss: 1.023008108139038 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 10, Loss: 1.022125244140625 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 11, Loss: 1.0212583541870117 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 12, Loss: 1.020407795906067 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 13, Loss: 1.01957368850708 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 14, Loss: 1.0187562704086304 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 15, Loss: 1.0179557800292969 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 16, Loss: 1.0171724557876587 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 17, Loss: 1.0164062976837158 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 18, Loss: 1.015657663345337 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f
Epoch 19, Loss: 1.014926552772522 on GPU-173d3486-b338-629d-3b9d-8a3a4c31ea3f

3rd process

Epoch 0, Loss: 1.0316228866577148 on CPU
Epoch 1, Loss: 1.0306103229522705 on CPU
Epoch 2, Loss: 1.0296107530593872 on CPU
Epoch 3, Loss: 1.0286246538162231 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 4, Loss: 1.0276520252227783 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 5, Loss: 1.02669358253479 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 6, Loss: 1.0257498025894165 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 7, Loss: 1.0248205661773682 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 8, Loss: 1.0239065885543823 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 9, Loss: 1.023008108139038 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 10, Loss: 1.022125244140625 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 11, Loss: 1.0212583541870117 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 12, Loss: 1.020407795906067 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 13, Loss: 1.01957368850708 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 14, Loss: 1.0187562704086304 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 15, Loss: 1.0179557800292969 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 16, Loss: 1.0171724557876587 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 17, Loss: 1.0164062976837158 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 18, Loss: 1.015657663345337 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f
Epoch 19, Loss: 1.014926552772522 on GPU-b9d18b6a-6648-3e3a-90af-d3d717ee8d8f

The first and second processes run on each of the two GPUs and the third process run on CPU because of the lack of GPUs at the beginning.

Once the first or second process finishs, a GPU becomes available and then the third process runs on the GPU.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ai_computing_broker-1.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (12.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

ai_computing_broker-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (12.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

ai_computing_broker-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

ai_computing_broker-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file ai_computing_broker-1.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ai_computing_broker-1.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 636c1ba08a576eaae121022a8d055dcc9600f9452d0b8b2563e9e0a5896c5ba7
MD5 303b93cfeb3f34f64cc9b4f71374e228
BLAKE2b-256 492a13556e361e221dd7b4053d818b0b681353b24be86242eff6b4070ff4c085

See more details on using hashes here.

File details

Details for the file ai_computing_broker-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ai_computing_broker-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c5fe34df1221c5a42ee30210a56ffe3a21aac6f0acde5ae8a467f5bdb0753777
MD5 3db16e049b09107c47fd5456508a2140
BLAKE2b-256 874a54e0868e501fcae051d5baea1bf521ca50d4cb8249c89d1a8fe25d0abedc

See more details on using hashes here.

File details

Details for the file ai_computing_broker-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ai_computing_broker-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 34d10f50b6a72b756929f12180b79cf29b176020f09c20788e74f3bdcdd04a74
MD5 ed5ef1131e4211b1cc841005ff7f8270
BLAKE2b-256 9a29bdcfbaa3c4fa4cc2a308ae3e70f17f199d18da715b828cd2d4112c254666

See more details on using hashes here.

File details

Details for the file ai_computing_broker-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ai_computing_broker-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3272f0f5233d2339dda6204647863861a7200a279839b2282e9b1e00d6b20741
MD5 082ac4d3587126286bc0ac0d76671f61
BLAKE2b-256 ca73fe3517d800601d2d01d33b4e3b7b9a742ea4521b9365bff653e8406d122f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page