Skip to main content

I/O profiler for deep learning python apps. Specifically for dlio_benchmark.

Project description

DFTracer Build and Test Coverage Status Documentation Status

DFTracer v1.0.4

A multi-level profiler for capturing application functions and low-level system I/O calls from deep learning workloads.

Requirements for profiler

  1. Python > 3.7
  2. pybind11

Requirements for analyzer

  1. bokeh>=2.4.2
  2. pybind11
  3. zindex_py
  4. pandas>=2.0.3
  5. dask>=2023.5.0
  6. distributed
  7. numpy>=1.24.3
  8. pyarrow>=12.0.1
  9. rich>=13.6.0
  10. python-intervals>=1.10.0.post1
  11. matplotlib>=3.7.3

Installation

Users can easily install DFTracer using pip. This is the way most Python packages are installed. This method would work for both native Python environments and Conda environments.

From PyPI

pip install pydftracer

From Github

DFT_VERSION=develop
pip install git+https://github.com/hariharan-devarajan/dftracer.git@${DFT_VERSION}

From source

git clone git@github.com:hariharan-devarajan/dftracer.git
cd dftracer
# You can skip this for installing the dev branch.
# for latest stable version use master branch.
git checkout tags/<Release> -b <Release>
pip install .

For more build instructions check here.

Usage

from dftracer.logger import dftracer, dft_fn
log_inst = dftracer.initialize_log(logfile=None, data_dir=None, process_id=-1)
dft_fn = dft_fn("COMPUTE")

# Example of using function decorators
@dft_fn.log
def log_events(index):
    sleep(1)

# Example of function spawning and implicit I/O calls
def posix_calls(val):
    index, is_spawn = val
    path = f"{cwd}/data/demofile{index}.txt"
    f = open(path, "w+")
    f.write("Now the file has more content!")
    f.close()
    if is_spawn:
        print(f"Calling spawn on {index} with pid {os.getpid()}")
        log_inst.finalize() # This need to be called to correctly finalize DFTracer.
    else:
        print(f"Not calling spawn on {index} with pid {os.getpid()}")

# NPZ calls internally calls POSIX calls.
def npz_calls(index):
    # print(f"{cwd}/data/demofile2.npz")
    path = f"{cwd}/data/demofile{index}.npz"
    if os.path.exists(path):
        os.remove(path)
    records = np.random.randint(255, size=(8, 8, 1024), dtype=np.uint8)
    record_labels = [0] * 1024
    np.savez(path, x=records, y=record_labels)

def main():
    log_events(0)
    npz_calls(1)
    with get_context('spawn').Pool(1, initializer=init) as pool:
        pool.map(posix_calls, ((2, True),))
    log_inst.finalize()

if __name__ == "__main__":
    main()

For this example, as the dftracer.initialize_log do not pass logfile or data_dir, we need to set DFTRACER_LOG_FILE and DFTRACER_DATA_DIR. By default the DFTracer mode is set to FUNCTION. Example of running this configurations are:

# The process id, app_name and .pfw will be appended by the profiler for each app and process.
# The name of the final log file is ~/log_file-<APP_NAME>-<PID>.pfw
DFTRACER_LOG_FILE=~/log_file
# Colon separated paths for including for profiler
DFTRACER_DATA_DIR=/dev/shm/:/p/gpfs1/$USER/dataset:$PWD/data
# Enable profiler
DFTRACER_ENABLE=1

For more example check Examples.

Citation and Reference

The original SC'24 paper describes the design and implementation of DFTracer code. Please cite this paper and the code if you use DFTracer for your research.

@inproceedings{devarajan_dftracer_2024,
	address = {Atlanta, GA},
	title = {{DFTracer}: {An} {Analysis}-{Friendly} {Data} {Flow} {Tracer} for {AI}-{Driven} {Workflows}},
	shorttitle = {{DFTracer}},
	urldate = {2024-07-31},
	booktitle = {{SC24}: {International} {Conference} for {High} {Performance} {Computing}, {Networking}, {Storage} and {Analysis}},
	publisher = {IEEE},
	author = {Devarajan, Hariharan and Pottier, Loic and Velusamy, Kaushik and Zheng, Huihuo and Yildirim, Izzet and Kogiou, Olga and Yu, Weikuan and Kougkas, Anthony and Sun, Xian-He and Yeom, Jae Seung and Mohror, Kathryn},
	month = nov,
	year = {2024},
}

@misc{devarajan_dftracer_code_2024,
    type = {Github},
    title = {Github {DFTracer}},
    shorttitle = {{DFTracer}},
    url = {https://github.com/hariharan-devarajan/dftracer.git},
    urldate = {2024-07-31},
    journal = {DFTracer: A multi-level dataflow tracer for capture I/O calls from worklows.},
    author = {Devarajan, Hariharan and Pottier, Loic and Velusamy, Kaushik and Zheng, Huihuo and Yildirim, Izzet and Kogiou, Olga and Yu, Weikuan and Kougkas, Anthony and Sun, Xian-He and Yeom, Jae Seung and Mohror, Kathryn},
    month = jun,
    year = {2024},
}

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344; and under the auspices of the National Cancer Institute (NCI) by Frederick National Laboratory for Cancer Research (FNLCR) under Contract 75N91019D00024. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. Office of Advanced Scientific Computing Research under the DOE Early Career Research Program. Also, This material is based upon work partially supported by LLNL LDRD 23-ERD-045 and 24-SI-005. LLNL-CONF-857447.

License

MIT License LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydftracer-1.0.6.tar.gz (76.7 kB view details)

Uploaded Source

Built Distributions

pydftracer-1.0.6-cp312-cp312-manylinux_2_38_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.38+ x86-64

pydftracer-1.0.6-cp312-cp312-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp312-cp312-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.31+ x86-64

pydftracer-1.0.6-cp311-cp311-manylinux_2_38_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.38+ x86-64

pydftracer-1.0.6-cp311-cp311-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp311-cp311-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.31+ x86-64

pydftracer-1.0.6-cp310-cp310-manylinux_2_38_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.38+ x86-64

pydftracer-1.0.6-cp310-cp310-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp310-cp310-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.31+ x86-64

pydftracer-1.0.6-cp39-cp39-manylinux_2_38_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.38+ x86-64

pydftracer-1.0.6-cp39-cp39-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp39-cp39-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.31+ x86-64

pydftracer-1.0.6-cp38-cp38-manylinux_2_38_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.38+ x86-64

pydftracer-1.0.6-cp38-cp38-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp38-cp38-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.31+ x86-64

pydftracer-1.0.6-cp37-cp37m-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.34+ x86-64

pydftracer-1.0.6-cp37-cp37m-manylinux_2_31_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.31+ x86-64

File details

Details for the file pydftracer-1.0.6.tar.gz.

File metadata

  • Download URL: pydftracer-1.0.6.tar.gz
  • Upload date:
  • Size: 76.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for pydftracer-1.0.6.tar.gz
Algorithm Hash digest
SHA256 eb35fecef606ff141606d75f4e77282a0979c01ec087944d7df85270ef9d5d99
MD5 2a91917483f0304aee8768bf0d024000
BLAKE2b-256 48ac4e0ad4276a88dd2110f4afe31fe5cac6630e76103ec542bc19a3d4cc9c31

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 d5922168acce3a0c594e725544a6f104c58cf3013f0078dae2f8468883d70109
MD5 a7d7a7cfa2dd5d41ca20b7821d5c0c68
BLAKE2b-256 b5d77349f7378d8baece052f5ae17232ac152a25dff965d597259ef3479668e5

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ea83de7d708f9fd4b2bff2e1ba0de753a49f44b633247dc6c3d8947dd385143d
MD5 f9e3f4419ec7444ab8255529dbfd2ab8
BLAKE2b-256 b41dd837a3b8567e59437de653453a3fd729b673786cde28e47cc342e09e26c5

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 0e75133b71d090f497b02833325fb81a2bbec28f4d1443bd569b987c2c085a54
MD5 51d213e1e042e11df71a628877558391
BLAKE2b-256 d871be168a07eb04fb5d289d9fe3bd09104c41ce648b7067a9cb65a2dabab904

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp311-cp311-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp311-cp311-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 e7e5d74b6c3b137ea493e67d8cdc9b1ccba653b09a54df050f1a15044c2129d4
MD5 5bc0506d85bd80394d46a6f45ffd401b
BLAKE2b-256 030ae6ce0ba74ea26fe0a81ca83efb19b3b39cb51ae0cc8aa9ab891a58fecd6c

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 faf016ee9e38742920f9a5c87abc81cbe0e4e1b61243e34f595b1842060cdfd8
MD5 359fbb5849c887a1a4d28ececf63021c
BLAKE2b-256 838bf28b299c166af8b8da0c95d6ab5828c5d300f51092ce67f96bebb5ad9a81

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 79cdc4ba20b33d94e851f7f0138ce52ba9a02670c8216f48d97497b2423f5f61
MD5 99653521c2709555cd2b65c006d290d9
BLAKE2b-256 cd6f9f7d19eebb2b4d7bc7b8c4cc53eaaf4103e8bc303e10ee1890aa551b76c8

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp310-cp310-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp310-cp310-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 e539a6dfdf840bb2d433cf87571b6ce9fd0cb42c22be4e02f89666aaead8182b
MD5 4f03c5865110eb4d32a3972657fa095a
BLAKE2b-256 18ec8a9a00002ce7b40132f36f3db3c15ed158d3de1446c96b89d9b6e4278372

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 45a061966dcfc0271189bb2028c8160a626a41f2435f9d0e86ff662143cad817
MD5 9995cd5c49260bd1032aadfcbd49710e
BLAKE2b-256 01bc94b09c50a6fcb2a6456e0af9a85170f6bdf489350971f43132caf738a842

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp310-cp310-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp310-cp310-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 c68634c3127b672cddeba9ba8fa35a85938360acb3d106017b3e87ba3151a4ff
MD5 36c34bc94ef8bf413ef874e1a617c06d
BLAKE2b-256 dba227367a39679cb8ad7d600b6ae6867da127392f6d94a7bf4ea046bbb607c0

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp39-cp39-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp39-cp39-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 114d470591e8f56a6dd15651eaa1fe3f244ae375f9e2a72e7d6b8a1c5d1cfd88
MD5 e7454f3b4ad353de234f1b0ac3e91c19
BLAKE2b-256 e9d95460325c3979ee72db6d591a83326627eac32d742454590bfc5e5ba5081c

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 dd1004cf6b34a23186885b6ceedd824bbb8327a0dc0e19afb90e81272869add4
MD5 e12a9ba41f70d3eb4e0871e2e3ab8967
BLAKE2b-256 593edb382613e8bb5800ad34442e97a5bba9593592b4d8a0e87995df07f87b8c

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp39-cp39-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp39-cp39-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 3cc710fc35a279378badb5cb68727b894cc07f1fa38dffcfb3270162d40796a0
MD5 7ccdafab3413d5a163c33a15e3ab35cd
BLAKE2b-256 d459c4e935bd0f46ae533961948681f77b3a0f2529401a3c0f8ebfe68d0cc3eb

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp38-cp38-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp38-cp38-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 2cc335d322aed9316cf569405d8b0374c6ad13d4e5f5496ef7aaada60c421070
MD5 23db1494fcf49d0cedbb2c678557d81e
BLAKE2b-256 0e27068072b55474e2b7f5df4b0f3abf2e33480adf0e5209d53e8d45bb297e90

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp38-cp38-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp38-cp38-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 7621b54a43e7192649dab6de836698c88af1c79bcba0e171c62341bff05f57fe
MD5 212445936cb27766279376b3add8ebda
BLAKE2b-256 807a65f8a3926be3abe69f30a08f52c62cf33c61764eb1900f2cd6a324580cd9

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp38-cp38-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp38-cp38-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 9cff84e452437ae6516ff4d0fe47165c505531d0cd6065bbd429abb2e34a394b
MD5 e119e8c934f2bef9b6fa34d713a81c1a
BLAKE2b-256 781c9bdc5a4a307cbb64c2f71477718c692d6cacf98d9474854e6fc132926531

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp37-cp37m-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp37-cp37m-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 67d6db1d1ccd2552881a79643f023fbabad1ca50a9570954754926c2f9960c77
MD5 efbff59cc3c799b7712bf5ee5c098536
BLAKE2b-256 94d446672995f682043304e76c6f3aeb945d931972472a9f9d7324eea4aa89f1

See more details on using hashes here.

File details

Details for the file pydftracer-1.0.6-cp37-cp37m-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for pydftracer-1.0.6-cp37-cp37m-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 8d7f681e8ed25c9adb910aa68619ca1f8f0d43faf962758c307ce1825bd73d42
MD5 75b8b4ff971bb63868df61ebb9bfb055
BLAKE2b-256 257590d8b9992530491c5b56d4ad6152d68538f88d4d18542488e97d5fd5215e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page