Runtime Tracing Library for TensorFlow
Project description
TensorFlow Runtime Tracer
This project is a web application to monitor and trace TensorFlow scripts in the runtime on the op level.
It starts a web server upon the execution of the script. The web interface keeps track of all the session runs and can trace the execution on demand.
The goal of this tool is to facilitate the process of performance tuning with minimal code changes and insignificant runtime overhead. Both Higher-level (tf.estimator.Estimator) and Low-level (tf.train.MonitoredTrainingSession and co) APIs are supported. It also supports horovod and IBM Distributed Deep Learning (DDL). The tracing session can be saved, reloaded, and distributed effortlessly.
Some screenshots here.
Installation
Use pip to install:
pip install tensorflow-tracer
Quick Start
- Install
tensorflow-tracerand run an example:$ pip3 install tensorflow-tracer $ git clone https://github.com/xldrx/tensorflow-tracer.git $ python3 ./tensorflow-tracer/examples/estimator-example.py
- Browse to:
http://0.0.0.0:9999
How to Use
-
Add
tftracerto your code:Estimator API:
from tftracer import TracingServer ... tracing_server = TracingServer() estimator.train(input_fn, hooks=[tracing_server.hook])
Low-Level API:
from tftracer import TracingServer ... tracing_server = TracingServer() with tf.train.MonitoredTrainingSession(hooks=[tracing_server.hook]): ...
-
Run your code and browse to:
http://0.0.0.0:9999
How to Trace an Existing Code
If you want to trace an existing script without any modification use tftracer.hook_inject
Please note that this is experimental and may cause unexpected errors:
-
Add the following to the beggining of the main script: .. code-block:: python
import tftracer tftracer.hook_inject() ... -
Run your code and browse to
http://0.0.0.0:9999
Command line
Tracing sessions can be stored either through the web interface or by calling tracing_server.save_session(filename).
To reload a session, run this in the terminal:
tftracer filename
Then browse to:
http://0.0.0.0:9999
API
Full Documentation is here.
Known Bugs/Limitations
- Only Python3 is supported.
- The web interface loads javascript/css libraries remotely (e.g.
vue.js,ui-kit,jquery,jquery-ui,Google Roboto,awesome-icons, ... ). Therefore an active internet connection is needed to properly render the interface. The tracing server does not require any remote connection. - All traces are kept in the memory while tracing server is running.
- Tracing uses
tf.train.SessionRunHookand is unable to trace auxiliary runs such asinit_op. - The tracing capability is limited to what
tf.RunMetadataoffers. For example, CUPTI events are missing when tracing a distributed job. - HTTPS is not supported.
Frequently Asked Questions
How to trace/visualize just one session run?
Use tftracer.Timeline. for example:
from tftracer import Timeline
...
with tf.train.MonitoredTrainingSession() as sess:
with Timeline() as tl:
sess.run(fetches, **tl.kwargs)
...
tl.visualize(filename)
Comparision to TensorBoard?
The nature of this project is a short-lived light-weight interactive tracing interface to monitor and trace execution on the op-level. In comparison TensorBoard is a full-featured tool to inspect the application on many levels:
-
tftracerdoes not make any assumption about the dataflow DAG. There is no need to add any additionalopto the data flow dag (i.e.tf.summary) or having aglobal step. -
tftracerruns as a thread and lives from the start of the execution and lasts until the end of it.TensorBoardruns as a separate process and can outlive the main script.
Cite this tool
@misc{hashemi-tftracer-2018,
author = {Sayed Hadi Hashemi},
title = {TensorFlow Runtime Tracer},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/xldrx/tensorflow-tracer}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tensorflow-tracer-1.1.0.tar.gz.
File metadata
- Download URL: tensorflow-tracer-1.1.0.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b1a16d5b247a3a707c9524ee3a3817b1b2adc9f44a4d86523287b8a8bf635c4
|
|
| MD5 |
2e50f2bb6b5115e3dce9f579695418a7
|
|
| BLAKE2b-256 |
7e961f1153de6fd1db30a249d6d2b26cecf41c4e4d926c9b6e88933366fbe54c
|
File details
Details for the file tensorflow_tracer-1.1.0-py3-none-any.whl.
File metadata
- Download URL: tensorflow_tracer-1.1.0-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daf8439d9c16c5630ad58fc36ffbf52eb333e2d16d53330aaf2091a785eabc82
|
|
| MD5 |
30c1b9f89eccdc582f19cc30a810d78a
|
|
| BLAKE2b-256 |
3cc42d3207e970bba9e0d57baee278e8c6e0d083793fb1c0bba13b14eb0d7c79
|