Skip to main content

Distributed wrapper for Tensorboard

Project description

Tensorplex is a multiplexed extension of the popular Tensorboard visualization tool. When you have a cluster, you can collect the learning curves from multiple running nodes and display them side-by-side on a single tensorboard web page.

Tensorplex makes extensive use of ZeroMQ under the hood, an efficient, robust, and lightweight distributed communication protocol.

Loggerplex is a subcomponent of Tensorplex that does lightweight distributed logging. It collects the real-time logs from multiple nodes and send them to a single master node for persistent book-keeping.

Tensorplex is not tied to Tensorflow and can be used with any machine learning frameworks that support numpy.

THIS DOC IS INCOMPLETE. MORE COMING SOON.

Installation

git clone https://github.com/StanfordVL/Tensorplex.git
pip install -e Tensorplex/

Demo

Go to Tensorplex/examples/. Change the tensorboard log folder in run_server.py script.

In one command line window, run python run_server.py. Then in another window, run python run_client.py. The server script should print out a list of dones.

Use tensorboard --logdir ~/Temp/tensorplex/ --port 8009 to view the results.

Manual

Tensorplex requires one long-running server script. Client scripts can connect and disconnect (e.g. client crashes) without impacting the server.

Tensorplex server

There are 3 steps to create the server script.

First, initialize a Tensorplex object with the root logging folder. Different clients will write to different sub-folders that are created automatically. max_processes is the number of processes that the server uses internally. Set it to 4 should be a sweet spot.

tplex = Tensorplex(
    '~/my/logging/folder',
    max_processes=4,
)

Second, register the client groups, which helps group Tensorflow curves into the same or different graph windows. The “client IDs” (explained later) in your client scripts must be consistent with the groups you register in the server.

There are 3 types of client groups:

  1. register_normal_group(name): each graph will have only one curve in a normal group.

  2. register_indexed_group(name, bin_size): each graph will have at most bin_size number of curves. Suppose you launch 42 agents with bin_size=10, the curves of agent 0-9 will be displayed in the same graph window; likewise, the curves of 10-19, 20-29, 30-39, 40-41 will be grouped in their respective graphs.

  3. register_combined_group(name, group_criterion): TODO

To register multiple groups, you can chain the commands:

(tplex
    .register_normal_group('learner')  # 1 curve per graph
    .register_indexed_group('agent', 8)  # 8 agent learning curves per graph
    .register_indexed_group('eval', 4)  # 4 eval curves per graph
    .register_combined_group('eval', get_eval_bin_name)
 )

Third, you specify a port and launch the server. The script will be blocking:

tplex.start_server(8008)
# block main thread forever

Tensorplex client

Every TensorplexClient object must have a client ID that looks like <group_name>/<client_name>, i.e. two string names separated by /.

client = TensorplexClient(
    client_id='agent/0',  # for indexed group
    # client_id='learner/system_stats',  # for normal group, here client_name is `system_stats`
    host='123.45.6.7',  # server address to connect
    port=8008,  # server port to connect
)

Then you can write statistics to TensorplexClient

# most fundamental method
client.add_scalar(tag, 3.1415, integer_step)
# add_scalars is equivalent to multiple add_scalar() in one line
client.add_scalars({tag: 3.1415, tag2: 2.71828, tag3: 42}, integer_step)

There are

Note that tag in add_scalar behaves differently for different client group types.

For normal group,

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Tensorplex-0.9.post1.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

Tensorplex-0.9.post1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file Tensorplex-0.9.post1.tar.gz.

File metadata

  • Download URL: Tensorplex-0.9.post1.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/3.6.4

File hashes

Hashes for Tensorplex-0.9.post1.tar.gz
Algorithm Hash digest
SHA256 37c44fa6618bb7f0fc5be236ef2dbfb891c81ddca379630198170709129a2170
MD5 437f5f151604aa585440c19d1c0b6c73
BLAKE2b-256 f0ebd94ee47b8b307294aa1439abe33c41ab8455c4b7eb150d5d84330cea3054

See more details on using hashes here.

File details

Details for the file Tensorplex-0.9.post1-py3-none-any.whl.

File metadata

  • Download URL: Tensorplex-0.9.post1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/3.6.4

File hashes

Hashes for Tensorplex-0.9.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 754385c2c9f85e02243d12377b2bab041f5d25dde79f55b6d4268e4ab91fa8f7
MD5 d38bc309ca1bc70e35e91457387c8fa0
BLAKE2b-256 6c65c84034380e9c58fd8aabded39f7ee8bdc770b1cbb9ea21e03d107e5fd411

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page