Skip to main content

Experiment toolkits

Project description

Introduction for Cof utils

There're several useful tools for experiments, such as cofrun, coflogger, and logspy.

Install

By Pypi

pip install cofutils

By Source

git clone https://gitee.com/haiqwa/cofutils.git
pip install .

Usage

Cof Memory Report

Print GPU memory states by pytorch cuda API

  • MA: memory current allocated
  • MM: max memory allocated
  • MR: memory reserved by pytorch
from cofutils import cofmem

cofmem("before xxx")
# ...
cofmem("after xxx")
(deepspeed) haiqwa@gpu9:~/documents/cofutils$ python ~/test.py 
[2023-11-11 15:32:46.873]  [Cof INFO]: before xxx GPU Memory Report (GB): MA = 0.00 | MM = 0.00 | MR = 0.00
[2023-11-11 15:32:46.873]  [Cof INFO]: after xxx GPU Memory Report (GB): MA = 0.00 | MM = 0.00 | MR = 0.00

Cof Logger

Cof logger can print user message according to print-level. In *.py:

from cofutils import coflogger
coflogger.debug("this is debug")
coflogger.info("this is info")
coflogger.warn("this is warn")
coflogger.error("this is error")

Print-level is determined by environment variable COF_DEBUG:

COF_DEBUG=WARN python main.py

The default print-level is INFO

Cof CSV

Dump data into csv format

from cofutils import cofcsv
data = [
        ['Name', 'Age', 'Gender'],
        ['Alice', 25, 'Female'],
        ['Bob', 30, 'Male'],
        ['Charlie', 35, 'Male']
    ]
cofcsv.save(data=data, path='result.csv')
data = []
data = cofcsv.load(path='result.csv')
print(data)

Cof Timer

Cof timer is similar to the Timer in Megatron-LM

It support two log modes:

  • Organize the result into a string and output it into STDOUT which is easy to view for users
  • Directly return the result time table
from cofutils import coftimer
from cofutils import coflogger
import time
test_1 = coftimer('test1')
test_2 = coftimer('test2')

for _ in range(3):
    test_1.start()
    time.sleep(1)
    test_1.stop()

coftimer.log(normalizer=3, timedict=False)


for _ in range(3):
    test_2.start()
    time.sleep(1)
    test_2.stop()

time_dict = coftimer.log(normalizer=3, timedict=True)
coflogger.info(time_dict)
(deepspeed) haiqwa@gpu9:~/documents/cofutils$ python ~/test.py 
[2023-11-11 16:15:43.942]  [Cof INFO]: time (ms) | test1: 1001.20 | test2: 0.00
NoneType: None
[2023-11-11 16:15:46.946]  [Cof INFO]: {'test1': 0.0, 'test2': 1001.2083053588867}

Cofrun is all you need!

User can easily launch distributed task by cofrun. What users need to do is to provide a template bash file and configuration json file.

You can see the examples in example/

(deepspeed) haiqwa@gpu9:~/documents/cofutils/example$ cofrun -h
usage: cofrun [-h] [--file FILE] [--input INPUT] [--template TEMPLATE] [--output OUTPUT] [--test] [--list] [--range RANGE]

optional arguments:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  config file path, default is ./config-template.json
  --input INPUT, -i INPUT
                        run experiments in batch mode. all config files are placed in input directory
  --template TEMPLATE, -T TEMPLATE
                        provide the path of template .sh file
  --output OUTPUT, -o OUTPUT
                        write execution output to specific path
  --test, -t            use cof run in test mode -> just generate bash script
  --list, -l            list id of all input files, only available when input dir is provided
  --range RANGE, -r RANGE
                        support 3 formats: [int | int,int,int... | int-int], and int value must be > 0

Let's run the example:

cofrun -f demo_config.json -T demo_template.sh

And the execution history of cofrun will be written into history.cof

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cofutils-0.0.4.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

cofutils-0.0.4-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file cofutils-0.0.4.tar.gz.

File metadata

  • Download URL: cofutils-0.0.4.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for cofutils-0.0.4.tar.gz
Algorithm Hash digest
SHA256 c81689ff02b5115532fe294581568986b5eba633631a7abcebf2f8bf4983ed1e
MD5 93a6a3c03eef5d0cebfda19a7e1b8365
BLAKE2b-256 30b71f24f37c283ac672ffa744e05d3f16702add96c8f2d3be2cc73ea4b54cfc

See more details on using hashes here.

File details

Details for the file cofutils-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: cofutils-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for cofutils-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bbf099bcfa3f039377fc93b8ace9e86bc5fcbcb93f7c15d44899034477a92cbc
MD5 54158ce1795b38b6b3a40f7fe60e9ba4
BLAKE2b-256 1ddff2b20bbdb19f1002f208e50c45ad12d0df030e95c431eb7e45852550d26d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page