Skip to main content

Runtime Tracing Library for TensorFlow

Project description

TensorFlow Runtime Tracer

This project is a web application to monitor and trace TensorFlow scripts in the runtime on the op level.

It starts a web server upon the execution of the script. The web interface keeps track of all the session runs and can trace the execution on demand.

The goal of this tool is to facilitate the process of performance tuning with minimal code changes and insignificant runtime overhead. Both Higher-level (tf.estimator.Estimator) and Low-level (tf.train.MonitoredTrainingSession and co) APIs are supported. It also supports horovod and IBM Distributed Deep Learning (DDL). The tracing session can be saved, reloaded, and distributed effortlessly.

Some screenshots here.

Installation

Use pip to install:

pip install tensorflow-tracer

Quick Start

  1. Install tensorflow-tracer and run an example:
    $ pip3 install tensorflow-tracer
    $ git clone https://github.com/xldrx/tensorflow-tracer.git
    $ python3 ./tensorflow-tracer/examples/estimator-example.py 
    
  2. Browse to: http://0.0.0.0:9999

How to Use

  1. Add tftracer to your code:

    Estimator API:

    from tftracer import TracingServer
    ...
    
    tracing_server = TracingServer()
    estimator.train(input_fn, hooks=[tracing_server.hook]) 
    

    Low-Level API:

    from tftracer import TracingServer
    ...
    tracing_server = TracingServer()
    with tf.train.MonitoredTrainingSession(hooks=[tracing_server.hook]):
       ...
    

    [More examples here]

  2. Run your code and browse to:

http://0.0.0.0:9999

How to Trace an Existing Code

If you want to trace an existing script without any modification use tftracer.hook_inject Please note that this is experimental and may cause unexpected errors:

  1. Add the following to the beggining of the main script: .. code-block:: python

    import tftracer
    tftracer.hook_inject()
    ...
    
  2. Run your code and browse to http://0.0.0.0:9999

Command line

Tracing sessions can be stored either through the web interface or by calling tracing_server.save_session(filename).

To reload a session, run this in the terminal:

tftracer filename

Then browse to:

http://0.0.0.0:9999

API

Full Documentation is here.

Known Bugs/Limitations

  • Only Python3 is supported.
  • The web interface loads javascript/css libraries remotely (e.g. vue.js, ui-kit, jquery, jquery-ui, Google Roboto, awesome-icons, ... ). Therefore an active internet connection is needed to properly render the interface. The tracing server does not require any remote connection.
  • All traces are kept in the memory while tracing server is running.
  • Tracing uses tf.train.SessionRunHook and is unable to trace auxiliary runs such as init_op.
  • The tracing capability is limited to what tf.RunMetadata offers. For example, CUPTI events are missing when tracing a distributed job.
  • HTTPS is not supported.

Frequently Asked Questions

How to trace/visualize just one session run?

Use tftracer.Timeline. for example:

    from tftracer import Timeline
    ...
    with tf.train.MonitoredTrainingSession() as sess:
       with Timeline() as tl:
        sess.run(fetches, **tl.kwargs)
    ...
    tl.visualize(filename)

Comparision to TensorBoard?

The nature of this project is a short-lived light-weight interactive tracing interface to monitor and trace execution on the op-level. In comparison TensorBoard is a full-featured tool to inspect the application on many levels:

  • tftracer does not make any assumption about the dataflow DAG. There is no need to add any additional op to the data flow dag (i.e. tf.summary) or having a global step.

  • tftracer runs as a thread and lives from the start of the execution and lasts until the end of it. TensorBoard runs as a separate process and can outlive the main script.

Cite this tool

@misc{hashemi-tftracer-2018,
  author = {Sayed Hadi Hashemi},
  title = {TensorFlow Runtime Tracer},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/xldrx/tensorflow-tracer}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorflow-tracer-1.1.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tensorflow_tracer-1.1.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file tensorflow-tracer-1.1.0.tar.gz.

File metadata

  • Download URL: tensorflow-tracer-1.1.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for tensorflow-tracer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 5b1a16d5b247a3a707c9524ee3a3817b1b2adc9f44a4d86523287b8a8bf635c4
MD5 2e50f2bb6b5115e3dce9f579695418a7
BLAKE2b-256 7e961f1153de6fd1db30a249d6d2b26cecf41c4e4d926c9b6e88933366fbe54c

See more details on using hashes here.

File details

Details for the file tensorflow_tracer-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: tensorflow_tracer-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for tensorflow_tracer-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 daf8439d9c16c5630ad58fc36ffbf52eb333e2d16d53330aaf2091a785eabc82
MD5 30c1b9f89eccdc582f19cc30a810d78a
BLAKE2b-256 3cc42d3207e970bba9e0d57baee278e8c6e0d083793fb1c0bba13b14eb0d7c79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page