Skip to main content

A CLI for extracting number arrays from an unstructured log file and plotting results.

Project description

ExactNum

https://pypi.org/project/extractnum/ https://pypi.org/project/extractnum/

A CLI for extracting arrays from an unstructured text file and plotting results.

For example, if you print some metrics into a log file, you can use this tool to extract them. This array can be plotting as a diagram to show the trend, or be saved into a stuctured file, e.g., json or csv.

pip install extractnum

Quick start

Plot an array from a log file

If you have an unstructured plain text file like:

[[032m2022-09-10 21:43:03,770]Total epoch: 0. model loss: 0.42456936836242676.
[[032m2022-09-10 21:43:03,791] token 0 - 5551, 1097.58837890625,  targeting
 token 1 - 1058.235107421875, InstoreAndOnline
 token 2 - 0.10239370167255402,  A
 token 3 - 0.10239171236753464,  sentence
 token 4 - 0.10238830745220184,  :
 token 5 - 977.8533935546875,  predict
 token 6 - 1051.5157470703125, --+
[[032m2022-09-10 21:43:04,297]Total epoch: 1. model loss: 0.39936694502830505.
[[032m2022-09-10 21:43:04,316] token 0 - 5551, 1097.58837890625,  targeting
 token 1 - 1058.3414306640625, InstoreAndOnline
 token 2 - 0.2732486128807068,  A
 token 3 - 0.2605493366718292,  sentence
 token 4 - 0.28173941373825073,  :
 token 5 - 978.6373291015625,  predict
 token 6 - 1051.77685546875, --+
[[032m2022-09-10 21:43:04,840]Total epoch: 2. model loss: 0.40558159351348877.
...

And you may want to extract the model loss values of all epochs. You can run:

extractnum training.log --pattern "model loss: {loss}"

Here model loss: is the prompt to the numbers, and {loss} specifies the placeholder for numbers. loss is the label of this array.

After running, all the loss values in this file can be plotting:

Smooth the array

ExtractNum supports smoothing the array, like TensorBoard. Run the following command to smooth the loss, which shows the trend more clearly:

extractnum training.log --pattern "model loss: {loss}" --smooth 0.8

Plot multiple arrays

You can also plotting multiple arrays together. For example, plot token 2, token 3 and token 4 in one diagram:

extractnum training.log --pattern "token 2 - {token_2}" "token 3 - {token_3}" "token 4 - {token_4}"

Save results

If you want to use these data for further usage, you can save them into a csv file.

extractnum training.log --pattern "token 2 - {token_2}" "token 3 - {token_3}" "token 4 - {token_4}" --output tokens.csv
token_2,token_3,token_4
0.10239370167255402,0.10239171236753464,0.10238830745220184
0.2732486128807068,0.2605493366718292,0.28173941373825073
0.43365949392318726,0.4471507668495178,0.4745367169380188
0.6074557304382324,0.6768703460693359,0.6920053362846375
0.8045746684074402,0.9262861013412476,0.9121480584144592
0.9546961784362793,1.186927080154419,1.1203949451446533
1.1149790287017822,1.4592962265014648,1.3308525085449219
...

ExtractNum detects the output format automatically by the path extension. Currently, the following formats are supported:

  • Any image format that matplotlib supports: save as an image file.
  • *.csv: save as a csv table format.
  • *.json: save as a json format.
  • *.txt / stdout: print a table to a text file or the standard output.
  • otherwise, show a matplotlib image window.

How does it work?

For each input pattern (e.g., model loss: {loss}), ExtractNum will replace the placeholder {loss} into a regex pattern. By default, a real number regex pattern [+|-]?\d*(\.\d*)? is used, and you can change it by --placehold_pattern {regex}. Using this processed regex pattern, ExtractNum scan the log file by lines and try to extract it. The label loss will be served as a group name in the processed pattern. You can also turn on the --regex mode, which regards the input pattern as a regex pattern without any further processing, and regard the group name as the label.

Usage

usage: extractnum [-h] [--pattern [<number pattern> ...]] [--x <label>]
                  [--regex] [--placehold_pattern <regex>] [--output <path>]
                  [--smooth <weight>] [--offset <offset>] [--limit <limit>]
                  [--verbose]
                  log_file

positional arguments:
  log_file              Log file path to parse

optional arguments:
  -h, --help            show this help message and exit
  --pattern [<number pattern> ...], -p [<number pattern> ...]
                        Pattern for extracting real numbers from log. For
                        example, for a log line 'training acc: 3.14%', a
                        pattern 'acc: {accuracy}' will extract 3.14, and plot
                        it with a label 'accuracy'. Note that this pattern
                        could only handle simple case. For a more complicated
                        case, please turn on --regex mode.
  --x <label>           Specify a label as the X array for plotting. For
                        example, if there exists an array with a label
                        "iteration", you can use "--x iteration" to make this
                        array as the plotting X array. Not that the label
                        should be in one of the patterns. By default, a
                        sequence of natural numbers will be used.
  --regex               Regex mode. If enable, patterns will be interpreted as
                        regex patterns. For example, for a log line 'training
                        acc: 3.14%', a pattern
                        'acc:\s(?P<accuracy>[+|-]?\d*(\.\d*)?)' will extract
                        3.14, and plot it with a label 'accuracy'.
  --placehold_pattern <regex>
                        The regex to replace the placeholder label. By
                        default, a real number regex is used:
                        "[+|-]?\d*(\.\d*)?".
  --output <path>, -o <path>
                        Output path. It supports the following types: (1) Any
                        image format that matplotlib supports: save as an
                        image file. (2) *.csv: save as a csv table format. (3)
                        *.json: save as a json format. (4) *.txt / stdout:
                        print a table to a text file or the standard output.
                        (5) otherwise, show a matplotlib image window.
  --smooth <weight>     Perform exponential moving average to smooth values
                        when plotting. Default: 0
  --offset <offset>     The number of skipping lines before parsing. Default:
                        0
  --limit <limit>       Max numbers for each label in parsing. 0 indicates no
                        limits. Default: 0
  --verbose, -v         Verbose mode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractnum-1.0.2.tar.gz (7.1 kB view details)

Uploaded Source

File details

Details for the file extractnum-1.0.2.tar.gz.

File metadata

  • Download URL: extractnum-1.0.2.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.11

File hashes

Hashes for extractnum-1.0.2.tar.gz
Algorithm Hash digest
SHA256 521ef400462fe962f24acdf3459b6d4002681604cecb98240567d49f265b8f34
MD5 3408c8a5c6f82e5ed3915c867e6a6f52
BLAKE2b-256 4d0b06cf59be95f1a31e56c2f3f75045172fddaa3ef11bf0d6880c1e0c75bf40

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page