Skip to main content

Check configuration files.

Project description

jaeger_stats

Parse Jaeger-json files in order to collect trace statistics

How to run an analysis

You can run the tool on a single Jaeger-trace via the command:

trace_analysis  <data_folder>  -c <data_folder>/CallChain

Here data_folder can be an absolute or a relative path, however the expansion of '~' to a home-folder is not supported. The path-encoding needs to match the conventions of your system (Windows or Linux/Unix/Mac).

The tool will analyse all read all json-file in the folder (assuming these are valid Jaeger-trace files) and will process these files and compute statistics. Each json file can contains one or more traces. Output will be generated in the next folders:

  • <data_folder>/Traces: contains a single file for each trace. This file is name <trace_id>.txt and contains fairly concise textual representation of the jaeger-trace
  • <data_folder>/Stats: contains file with the statistics over traces. The most important one is 'Stats/cummulative_trace_stats.csv' which contains statistics over all traces. However, you will also see a number of other files such as 'Stats/gateway_POST__services_orders_update.csv' which contains the statistics over the subset of traces originating from the end-point 'gateway/POST:/services/orders/update/'. Next to each of the .csv files we will save a .json file with the same based that contains the full dataset (csv-files are a sub-set for reading in excel. The full files are used for later post-processing, for example by the 'stitch' tool)
  • <data_folder>/CallChain: This folder contains a text-files such as for example 'Stats/gateway_POST__services_orders_update.cchain' which contains a list of all call-chains that originate at the API-gateway endpoint 'gateway/POST:/services/orders/update/'. So each line in this cchain-file represents a unique series of process (microservices) that appears in the input-traces. These Cchain files give an impression of the complexity of the processing, and these files also serve a purpose in the correction of incomplete traces, which is the topic of a separate section.
  • report.txt: a structured log-file showing a summary and detail information on the analysis process.

Traces will be deduplicated before analysis based on the 'trace_id' so if the folder contains files that overlap in traces they contain this overlap is removed.

When you run the commandd with flag --help you see:

$ trace_analysis --help
Parsing and analyzing Jaeger traces

Usage: trace_analysis [OPTIONS] <INPUT>

Arguments:
  <INPUT>  

Options:
      --caching-process <CACHING_PROCESS>      
  -c, --call-chain-folder <CALL_CHAIN_FOLDER>  [default: /home/ceesvk/CallChain/]
  -t, --timezone-minutes <TIMEZONE_MINUTES>    [default: 120]
  -f, --comma-float                            
  -h, --help                                   Print help
  -V, --version                                Print version

The options are:

  • --caching-process: a comma separated list of processes that apply caching of results. This information os relevant as the call-chains that contain these services are called less often as the downstream data migh be cached. If you know the cache-hit-rates you are able to correct the leaf nodes to compute the expected number of calls when the cache is turned off (or flushed). It is also possible to acctually compute the cache-hit ratios by comparing the traffic on the 'path/cached_service' vs 'path/cached_service LEAF', where the version marked with 'LEAF' are the the calls that do not have any downstream processing This can happens for example when a cache-hits removes the need for downstream analysis. However, this be care-ful this also occures if the service does not do down-stream calls for other reasons, such as incorrect or empty parameters.
  • --call-chain-folder: The folder containing files used to correct incomplete call-chains
  • --timezone-minutes: The offset in minutes for the current timezone relative to UTC. The default value is 120 minutes which corresponds to AMS-timezone
  • -- comma-float: In CSV files floating point values are using a comma as separator instead of the '.' to allow the file to be read in an Excel. The default value is 'true' (using )

Contents of the files with statistics

The statistics files, such as 'Stats/cummulative_trace_stats.csv' use the ';' as the column separator. This file falls apart in four sections:

  1. Generic information such as, the list of trace_ids, the start_times of these traces and the average duration of these process
  2. Process-information: Lists all processes (services) in the call-chain and shows the number of inbound and outbound on this service. However it does not contain any details on the opertion being called)
  3. Process/operation: List the statistics like call-frequency, average time, max time, etc.. for each process/service
  4. Call-chain: List statistics for the full-call chain and also shows whether a service is a leaf-node or contains further downstream calls. Please note that the execution-time of a service/operation includes the execution time of all downstream calls performed. However, if you all heavy lifting is done in leaf-nodes the sum of the average time of the Leaf-nodes should come close to the average trace duration.

Correction of call-chains

Jaeger tracing spans are send over UDP, which is a protocol that does not give strong delivery guarantees. So occasionally a span might be lost which results in an incomplete trace, and thus broken call-chains in the trace. This is where the weird '-c' option pops up as seen in the previous example: trace_analysis <data_folder> -c <data_folder>/CallChain. Here the CallChain produced by the first run of the tool (only showing complete chains) will be used in the subsequent runs of the tool to correct incomplete call-chains for missing spans. However, the preferred option is to set up a separate folder to contain the call-chains, refer the '--call-chain-folder' or '-c' to this folder.

The call-chain corrections are only applied:

  • to traces that do miss some of the spans.
  • to call-chains that do not exist in the call-chain-file for the end-point of the current trace
  • in case the call-chain can be matched exactly on the tail of 1 other call-chain. So if more than one match exist the correction will not be applied.

Correction of operations (path parameters)

Path parameters might wreak havoc on our analysis as path parameters make each URL unique while we are looking for averages over a number of invocations Therefore the system does correction on the URL's to extract the parameters, for example an order number and replaces that with a symbolic value '{ORDER}'. However, these replacements are currently hardcoded and we need to take some steps to make this configurable.

Computation of the rates (request/second)

If data is provided in a large batches it is possible to compute the rate from the data. However, we do not want to assume that all files with traces fall in the same time-period. Therefore we compute frequencies by computing times between subsequent calls and dropping the num_files largest intervals, as these might corresponds to gaps inbetween files. Based on this time the rate is computed as a frequency by the formula f=1/T where T is the duration in seconds between subsequent calls.

Extracting Jaeger JSON data

In the Jaeger web-based front end it is possible to make a selection of traces. After these traces have been returned you have two methods to extract the JSON files:

  1. Click on a single trace and in the right-top of the page select Download as 'JSON'.
  2. Open the developers tools and navigate to the network-tab. Now fire the request:
    1. Navigate to the response page. It might take some time to download the data and to transform and pretty-print the JSON. Select the full response and copy-paste it to a file
    2. Right-click on the response and select 'Copy Curl-URL' (for your system). Paste this URL in a console and redirect the output to a file. Using method 2.1 you can get approximately 1000 traces in a batch. The batch will be available as pretty-printed JSON in UTF8.

Method 2.2 allows you to select 1000 traces or more. However, the output a single line of raw json (not-pretty-printed) and the file is encoded in UTF-16-LE with BOM. The 'trace_analysis' can handle these files and will do an in-memory conversion to UTF8 before processing. Beware that this is a non-streaming conversion so the full file is in memory twice.

Using stitch-tool to merges results of different runs

$ stitch -h`
Stitching results of different runs of trace_analysis into a single CSV for visualization in Excel

Usage: stitch [OPTIONS]

Options:
  -s, --stitch-list <STITCH_LIST>  [default: input.stitch]
  -o, --output <OUTPUT>            [default: stitched.csv]
  -c, --comma-float                
  -h, --help                       Print help
  -V, --version                    Print version

The options are:

  • --stitch_list: a file that shows the paths for all result.json files that need to be stitched together. All text after a '#' is considered comments. Empty lines are ignored (including lines that start with a comment) and lines that start with a % will show up as an empty column in the analysis (used to temporarily exclude a missing file or file containing outliers). Text after the '%' is ignored. All relative paths in the stitch-list are expected to start in the folder that contains the 'input.stitch' file, such that you can move the complete folder of the 'input.stitch' to a different location.
  • --output: The output-file in CSV-format that contains the data stitched together. Each column in this file represents a single input-file from 'input.stitch'. Each statistic is a separate line and the second column represents the name of the statistic.
  • -- comma-float: In CSV files floating point values are using a comma as separator instead of the '.' to allow the file to be read in an Excel. The default value is 'true' (using )

An example of an input-file ('input.stitch') is:

#  comment line: this line is full ignored
/home/ceesvk/jaeger/batch/Stats/cummulative_trace_stats.json       # an absolute path
../../jaeger/get_order/Stats/cummulative_trace_stats.json    # a relative path
% ../../jaeger/post_order/Stats/cummulative_trace_stats.json  # This line is showing up as an empty column due to the % in front

# yet another comment (empty line above is ignored)

Beware that ALL files in the 'input.stitch' should exist and should be valid input files, otherwise the 'stitch' program will terminate with no output.

How to build trace_analysis

The tool is include in the examples folder and can be build via the command:

cargo build trace_analysis

The 'trace_analysis' executable can be found in 'target/debug/examples/trace_analysis'.

In case you need to process a large volume of traces you might aim for the more performant 'release' build (which also drops some run-time checks). To build a release version use:

cargo build --release trace_analysis

The 'trace_analysis' executable can be found in 'target/release/examples/trace_analysis'.

You can also install the tool via

cargo install --release trace_analysis

On linux this will deploy a release version of 'trace_analysis' in the folder '$HOME/.cargo/bin/' which is assumed to be included in your path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaeger_stats-0.2.1b1.tar.gz (63.6 kB view hashes)

Uploaded Source

Built Distribution

jaeger_stats-0.2.1b1-py3-none-manylinux_2_34_x86_64.whl (2.3 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.34+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page