Skip to main content

A print and debugging utility that makes your error printouts look nice

Project description

A common pain that comes after getting to launch ML training jobs on AWS is a lack of a good way to manage and visualize your data. So far, a common practice is to upload your experiment data to aws s3 or google cloud buckets. Then one quickly realizes that downloading data from s3 can be slow. s3 does not offer diffsync like gcloud-cli’s g rsync. This makes it hard to sync a large collection of data that is constantly appended to.

An Example Log from ML-Logger

So far the best way we have found for organizing experimental data is to have a centralized instrumentation server. Compared with managing your data on S3, a centralized instrumentation server makes it much easier to move experiments around, run analysis that is co-located with your data, and hosting visualization dashboards on the same machine. To download data locally, you can use sshfs, smba, rsync or a variety of remote disks. All faster than s3.

ML-Logger is the logging utility that allows you to do this. To make ML_logger easy to use, we made it so that you can use ml-logger with zero configuration, logging to your local hard-drive by default. When the logging directory field logger.configure(log_directory= <your directory>) is an http end point, the logger will instantiate a fast, future based logging client that launches http requests in a separate thread. We optimized the client so that it won’t slow down your training code-block.

API wise, ML-logger makes it easy for you to log textual printouts, simple scalars, numpy tensors, image tensors, and pyplot figures. Because you might also want to read data from the instrumentation server, we also made it possible to load numpy, pickle, text and binary files remotely.

In the future, we will start building an integrated dashboard with fast search, live figure update and markdown-based reporting/dashboarding to go with ml-logger.

Now give this a try, and profit!

Usage

To install ml_logger, do:

pip install ml-logger

To kickstart a logging server, run

python -m ml_logger.server

In your project files, do:

from params_proto import cli_parse
from ml_logger import logger


@cli_parse
class Args:
    seed = 1
    D_lr = 5e-4
    G_lr = 1e-4
    Q_lr = 1e-4
    T_lr = 1e-4
    plot_interval = 10
    log_dir = "http://54.71.92.65:8081"
    log_prefix = "https://github.com/episodeyang/ml_logger/blob/master/runs"

logger.configure(log_directory="http://some.ip.address.com:2000", prefix="your-experiment-prefix!")
logger.log_params(Args=vars(Args))
logger.log_file(__file__)


for epoch in range(10):
    logger.log(step=epoch, D_loss=0.2, G_loss=0.1, mutual_information=0.01)
    logger.log_keyvalue(epoch, 'some string key', 0.0012)
    # when the step index updates, logger flushes all of the key-value pairs to file system/logging server

logger.flush()

# Images
face = scipy.misc.face()
face_bw = scipy.misc.face(gray=True)
logger.log_image(index=4, color_image=face, black_white=face_bw)
image_bw = np.zeros((64, 64, 1))
image_bw_2 = scipy.misc.face(gray=True)[::4, ::4]

logger.log_image(i, animation=[face] * 5)

This version of logger also prints out a tabular printout of the data you are logging to your stdout. - can silence stdout per key (per logger.log call) - can print with color: logger.log(timestep, some_key=green(some_data)) - can print with custom formatting: logger.log(timestep, some_key=green(some_data, percent)) where percent - uses the correct unix table characters (please stop using | and +. Use ``│``, ``┼`` instead)

A typical print out of this logger look like the following:

from ml_logger import ML_Logger

logger = ML_Logger(log_directory=f"/mnt/bucket/deep_Q_learning/{datetime.now(%Y%m%d-%H%M%S.%f):}")

logger.log_params(G=vars(G), RUN=vars(RUN), Reporting=vars(Reporting))

outputs the following

═════════════════════════════════════════════════════
              G
───────────────────────────────┬─────────────────────
           env_name            │ MountainCar-v0
             seed              │ None
      stochastic_action        │ True
         conv_params           │ None
         value_params          │ (64,)
        use_layer_norm         │ True
         buffer_size           │ 50000
      replay_batch_size        │ 32
      prioritized_replay       │ True
            alpha              │ 0.6
          beta_start           │ 0.4
           beta_end            │ 1.0
    prioritized_replay_eps     │ 1e-06
      grad_norm_clipping       │ 10
           double_q            │ True
         use_dueling           │ False
     exploration_fraction      │ 0.1
          final_eps            │ 0.1
         n_timesteps           │ 100000
        learning_rate          │ 0.001
            gamma              │ 1.0
        learning_start         │ 1000
        learn_interval         │ 1
target_network_update_interval │ 500
═══════════════════════════════╧═════════════════════
             RUN
───────────────────────────────┬─────────────────────
        log_directory          │ /mnt/slab/krypton/machine_learning/ge_dqn/2017-11-20/162048.353909-MountainCar-v0-prioritized_replay(True)
          checkpoint           │ checkpoint.cp
           log_file            │ output.log
═══════════════════════════════╧═════════════════════
          Reporting
───────────────────────────────┬─────────────────────
     checkpoint_interval       │ 10000
        reward_average         │ 100
        print_interval         │ 10
═══════════════════════════════╧═════════════════════
╒════════════════════╤════════════════════╕
│      timestep      │        1999        │
├────────────────────┼────────────────────┤
│      episode       │         10         │
├────────────────────┼────────────────────┤
│    total reward    │       -200.0       │
├────────────────────┼────────────────────┤
│ total reward/mean  │       -200.0       │
├────────────────────┼────────────────────┤
│  total reward/max  │       -200.0       │
├────────────────────┼────────────────────┤
│time spent exploring│       82.0%        │
├────────────────────┼────────────────────┤
│    replay beta     │        0.41        │
╘════════════════════╧════════════════════╛
from ml_logger import ML_Logger

logger = ML_Logger('/mnt/slab/krypton/unitest')
logger.log(0, some=Color(0.1, 'yellow'))
logger.log(1, some=Color(0.28571, 'yellow', lambda v: f"{v * 100:.5f}%"))
logger.log(2, some=Color(0.85, 'yellow', percent))
logger.log(3, {"some_var/smooth": 10}, some=Color(0.85, 'yellow', percent))
logger.log(4, some=Color(10, 'yellow'))
logger.log_histogram(4, td_error_weights=[0, 1, 2, 3, 4, 2, 3, 4, 5])

colored output: (where the values are yellow)

╒════════════════════╤════════════════════╕
│        some        │        0.1         │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│        some        │     28.57100%      │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│        some        │       85.0%        │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│  some var/smooth   │         10         │
├────────────────────┼────────────────────┤
│        some        │       85.0%        │
╘════════════════════╧════════════════════╛

TODO:

  • [ ] Integrate with visdom, directly plot locally.

    • (better to keep it separate, because visdom is shitty.)

    • ml_logger does NOT know the full data set. Therefore we should not expect it to do the data processing such as taking mean, reservoir sampling etc. Where should this happen though?

    • just log to visdom for now. Use the primitive plot.ly plotting inteface.

    • data: keys/values

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ml_logger-0.0.47-py3-none-any.whl (13.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page