osaca

Open Source Architecture Code Analyzer

These details have not been verified by PyPI

Project links

Homepage

Project description

OSACA

Open Source Architecture Code Analyzer

For an innermost loop kernel in assembly, this tool allows automatic instruction fetching of assembly code and automatic runtime prediction including throughput analysis and detection for critical path and loop-carried dependencies.

Getting started

OSACA is as a python module with a command line interface.

OSACA is also integrated into the Compiler Explorer at godbolt.org, which allows using OSACA from a browser without any installation. To analyze an assembly snippet, go to https://godbolt.org change language to “Analysis”, insert an AArch64 or x86 assembly code and make sure OSACA is selected in the corresponding analysis panel, e.g., https://godbolt.org/z/shK4f8. When analyzing a high-level language code, use the “Add tool…” menu in the compiler output panel to add OSACA analysis, e.g. https://godbolt.org/z/hbMoPn. To change the micro architecture model, add --arch and µarch shortname (e.g., SKX for Skylake, ZEN2, N1 for ARM Neoverse) to the “Compiler options…” (when using “Analysis” mode) or “Arguments” (when analyzing compiler output of a high-level code).

Installation

On most systems with python pip and setuputils installed, just run:

pip install --user osaca

for the latest release.

To build OSACA from source, clone this repository using git clone https://github.com/RRZE-HPC/OSACA and run in the root directory:

python ./setup.py install

After installation, OSACA can be started with the command osaca in the CLI.

Dependencies:

Necessary equirements are:

Python3
Graphviz for dependency graph creation (minimal dependency is libgraphviz-dev on Ubuntu)
Python packages:

Optional requirements are:

Kerncraft >=v0.8.4 for marker insertion
ibench or asmbench for throughput/latency measurements
BeautifulSoup4 for scraping instruction form information for the x86 ISA (experimental)

Design

A schematic design of OSACA’s workflow is shown below:

Usage

The usage of OSACA can be listed as:

osaca [-h] [-V] [--arch ARCH] [--fixed] [--lines LINES]
      [--ignore-unknown] [--lcd-timeout SECONDS]
      [--db-check] [--import MICROBENCH] [--insert-marker]
      [--export-graph GRAPHNAME] [--consider-flag-deps]
      [--out OUT] [--yaml-out YAML_OUT] [--verbose]
      FILEPATH

-h, --help: prints out the help message.
-V, --version: shows the program’s version number.
--arch ARCH: needs to be replaced with the target architecture abbreviation. See the table of supported microarchitectures below for all possible options. If no micro-architecture is given, OSACA assumes a default architecture for x86/AArch64.
--syntax SYNTAX: Define the assembly syntax (ATT, Intel) for x86. If no syntax is given, OSACA tries to determine automatically the syntax to use.
--fixed: Run the throughput analysis with fixed port utilization for all suitable ports per instruction. Otherwise, OSACA will print out the optimal port utilization for the kernel.
--lines: Define lines that should be included in the analysis. This option overwrites any range defined by markers in the assembly. Add either single lines or ranges defined by “-” or “:”, each entry separated by commas, e.g.: --lines 1,2,8-18,20:24
--db-check: Run a sanity check on the by “–arch” specified database. The output depends on the verbosity level. Keep in mind you have to provide an existing (dummy) filename in anyway.
--import MICROBENCH: Import a given microbenchmark output file into the corresponding architecture instruction database. Define the type of microbenchmark either as “ibench” or “asmbench”.
--insert-marker: OSACA calls the Kerncraft module for the interactively insertion of IACA byte markers or OSACA AArch64 byte markers in suggested assembly blocks.
--export-graph EXPORT_PATH: Output path for .dot file export. If “.” is given, the file will be stored as “./osaca_dg.dot”. After the file was created, you can convert it to a PDF file using dot.
--ignore-unknown: Force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms. If not specified, a warning will be printed instead if one ore more isntruction form is unknown to OSACA.
--lcd-timeout SECONDS: Set timeout in seconds for LCD analysis. After timeout, OSACA will continue its analysis with the dependency paths found up to this point. Defaults to 10.
-f, --consider-flag-deps: Consider flag dependencies for the critical path and loop-carried dependency analysis. By default, those dependencies are ignored.
-v, --verbose: Increases verbosity level
-o OUT, --out OUT: Write analysis to this file (default to stdout)
--yaml-out YAML_OUT: Write analysis as YAML representation to this file

The FILEPATH describes the filepath to the file to work with and is always necessary, use “-” to read from stdin.

Supported microarchitectures

x86 CPUs

Designer	Model/microarch	OSACA flag
Intel	Sandy Bridge	SNB
Intel	Ivy Bridge	IVB
Intel	Haswell	HSW
Intel	Broadwell	BDW
Intel	Skylake-X	SKX
Intel	Cascadelake-X	CSX
Intel	Icelake client	ICL
Intel	Icelake server	ICX
Intel	Sapphire Rapids	SPR
AMD	Naples / Zen 1	ZEN1
AMD	Rome / Zen 2	ZEN2
AMD	Milan / Zen 3	ZEN3
AMD	Genoa / Zen 4	ZEN4

ARM AArch64 CPUs

Designer	Model/microarch	OSACA flag
ARM	Cortex-A72	A72
ARM	Neoverse N1	N1
ARM	Neoverse V2	V2
Marvell	ThunderX2	TX2
Fujitsu	FX700/A64FX	A64FX
HiSilicon	TaiShan v110	TSV110
Apple	M1-Firestorm	M1
NVIDIA	Neoverse V2/Grace	V2

Hereinafter OSACA’s scope of function will be described.

Throughput & Latency analysis

As main functionality of OSACA, the tool starts the analysis on a marked assembly file by running the following command with one or more of the optional parameters:

osaca --arch ARCH [--fixed] [--ignore-unknown]
                  [--export-graph EXPORT_PATH]
      file

The file parameter specifies the target assembly file and is always mandatory.

The parameter ARCH is positional for the analysis and must be replaced by the target architecture abbreviation.

OSACA assumes an optimal scheduling for all instructions and assumes the processor to be able to schedule instructions in a way that it achieves a minimal reciprocal throughput. However, in older versions (<=v0.2.2) of OSACA, a fixed probability for port utilization was assumed. This means, instructions with N available ports for execution were scheduled with a probability of 1/N to each of the ports. This behavior can be enforced by using the --fixed flag.

If one or more instruction forms are unknown to OSACA, it refuses to print an overall throughput, CP and LCD analysis and marks all unknown instruction forms with X next to the mnemonic. This is done so the user does not miss out on this unrecognized instruction and might assume an incorrect runtime prediction. To force OSACA to apply a throughput and latency of 0.0 cy for all unknown instruction forms, the flag --ignore-unknown can be specified.

To get a visualization of the analyzed kernel and its dependency chains, OSACA provides the option to additionally produce a graph as DOT file, which represents the kernel and all register dependencies inside of it. The tool highlights all LCDs and the CP. The graph generation is done by running OSACA with the --export-graph EXPORT_GRAPH flag. OSACA stores the DOT file either at the by EXPORT_GRAPH specified filepath or uses the default filename “osaca_dg.dot” in the current working directory. Subsequently, the DOT-graph can be adjusted in its appearance and converted to various output formats such as PDF, SVG, or PNG using the dot command, e.g., dot -Tpdf osaca_dg.dot -o graph.pdf to generate a PDF document.

Marker insertion

For extracting the right kernel, one can mark it in beforehand. Currently, only the detection of markers in the assembly code and therefore the analysis of assembly files is supported by OSACA. If OSACA cannot find any markers in the given input file, all lines will be evaluated.

Marking a kernel means to insert the byte markers in the assembly file in before and after the loop. For this, the start marker has to be inserted right in front of the loop label and the end marker directly after the jump instruction. IACA requires byte markers since it operates on opcode-level. To provide a trade-off between reusability for such tool and convenient usability, OSACA supports both byte markers and comment line markers. While the byte markers for x86 are equivalent to IACA byte markers, the comment keywords OSACA-BEGIN and OSACA-END are based on LLVM-MCA’s markers.

x86 markers

Byte markers

  movl    $111,%ebx       #IACA/OSACA START MARKER
  .byte   100,103,144     #IACA/OSACA START MARKER
.loop:
  # loop body
  jb      .loop
  movl    $222,%ebx       #IACA/OSACA END MARKER
  .byte   100,103,144     #IACA/OSACA END MARKER

Comment line markers

  # OSACA-BEGIN
.loop:
  # loop body
  jb      .loop
  # OSACA-END

AArch64 markers

Byte markers

  mov      x1, #111        // OSACA START
  .byte    213,3,32,31     // OSACA START
.loop:
  // loop body
  b.ne     .loop
  mov      x1, #222        // OSACA END
  .byte    213,3,32,31     // OSACA END

Comment line markers

  // OSACA-BEGIN
.loop:
  // loop body
  b.ne     .loop
  // OSACA-END

OSACA in combination with Kerncraft provides a functionality for the automatic detection of possible loop kernels and inserting markers. This can be done by using the --insert-marker flag together with the path to the target assembly file and the target architecture.

Benchmark import

OSACA supports the automatic integration of new instruction forms by parsing the output of the micro- benchmark tools asmbench and ibench. This can be achieved by running OSACA with the command line option --import MICROBENCH:

osaca --arch ARCH --import MICROBENCH file

MICROBENCH specifies one of the currently supported benchmark tools, i.e., “asmbench” or “ibench”. ARCH defines the abbreviation of the target architecture for which the instructions will be added and file must be the path to the generated output file of the benchmark. The format of this file has to match either the basic command line output of ibench, e.g.,

[INSTRUCTION FORM]-TP:    0.500 (clock cycles)    [DEBUG - result: 1.000000]
[INSTRUCTION FORM]-LT:    4.000 (clock cycles)    [DEBUG - result: 1.000000]

or the command line output of asmbench including the name of the instruction form in a separate line at the beginning, e.g.:

[INSTRUCTION FORM]
Latency: 4.00 cycle
Throughput: 0.50 cycle

Note that there must be an empty line after each throughput measurement as part of the output so that one instruction form entry consists of four (4) lines.

To let OSACA import the instruction form with the correct operands, the naming conventions for the instruction form name must be followed:

The first part of the name is the mnemonic and ends with the character “-” (not part of the mnemonic in the DB).
The second part of the name are the operands. Each operand must be separated from another operand by the character “_”.
For each x86 operand, one of the following symbols must be used:
- “r” for general purpose registers (rax, edi, r9, …)
- “x”, “y”, or “z” for xmm, ymm, or zmm registers, respectively
- “i” for immediates
- “m” for a memory address. Add “b” if the memory address contains a base register, “o” if it contains an offset, “i” if it contains an index register, and “s” if the index register additionally has a scale factor of more than 1.
For each AArch64 operand, one of the following symbols must be used:
- “w”, “x”, “b”, “h”, “s”, “d”, or “q” for registers with the corresponding prefix.
- “v” followed by a single character (”b”, “h”, “s”, or “d”) for vector registers with the corresponding lane width of the second character. If no second character is given, OSACA assumes a lane width of 64 bit (d) as default.
- “i” for immediates
- “m” for a memory address. Add “b” if the memory address contains a base register, “o” if it contains an offset, “i” if it contains an index register, and “s” if the index register additionally has a scale factor of more than 1. Add “r” if the address format uses pre-indexing and “p” if it uses post-indexing.

Valid instruction form examples for x86 are vaddpd-x_x_x, mov-r_mboi, and vfmadd213pd-mbis_y_y.

Valid instruction form examples for AArch64 are fadd-vd_vd_v, ldp-d_d_mo, and fmov-s_i.

Note that the options to define operands are limited, therefore, one might need to adjust the instruction forms in the architecture DB after importing. OSACA parses the output for an arbitrary number of instruction forms and adds them as entries to the architecture DB. The user must edit the ISA DB in case the instruction form shows irregular source and destination operands for its ISA syntax. OSACA applies the following rules by default:

If there is only one operand, it is considered as source operand
In case of multiple operands the target operand (depending on the ISA syntax the last or first one) is considered to be the destination operand, all others are considered as source operands.

Database check

Since a manual adjustment of the ISA DB is currently indispensable when adding new instruction forms, OSACA provides a database sanity check using the –db-check flag. It can be executed via:

osaca --arch ARCH --db-check [-v] file

ARCH defines the abbreviation of the target architecture of the database to check. The file argument needs to be specified as it is positional but may be any existing dummy path. When called, OSACA prints a summary of database information containing the amount of missing throughput values, latency values or μ-ops assignments for an instruction form. Furthermore, it shows the amount of duplicate instruction forms in both the architecture DB and the ISA DB and checks how many instruction forms in the ISA DB are non-existent in the architecture DB. Finally, it checks via simple heuristics how many of the instruction forms contained in the architecture DB might miss an ISA DB entry. Running the database check including the -v verbosity flag, OSACA prints in addition the specific name of the identified instruction forms so that the user can check the mentioned incidents.

Examples

For clarifying the functionality of OSACA a sample kernel is analyzed for an Intel CSX core hereafter:

double a[N], double b[N];
double s;

// loop
for(int i = 0; i < N; ++i)
    a[i] = s * b[i];

The code shows a simple scalar multiplication of a vector b and a floating-point number s. The result is written in vector a. After including the OSACA byte marker into the assembly, one can start the analysis typing

osaca --arch CSX PATH/TO/FILE

in the command line.

The output is:

Open Source Architecture Code Analyzer (OSACA) - v0.3
Analyzed file:      scale.s.csx.O3.s
Architecture:       csx
Timestamp:          2019-10-03 23:36:21

 P - Throughput of LOAD operation can be hidden behind a past or future STORE instruction
 * - Instruction micro-ops not bound to a port
 X - No throughput/latency information for this instruction in data file


    Combined Analysis Report
    -----------------------
                                         Port pressure in cycles
         |  0   - 0DV  |  1   |  2   -  2D  |  3   -  3D  |  4   |  5   |  6   |  7   ||  CP  | LCD  |
    -------------------------------------------------------------------------------------------------
     170 |             |      |             |             |      |      |      |      ||      |      |   .L22:
     171 | 0.50        | 0.50 | 0.50   0.50 | 0.50   0.50 |      |      |      |      ||  8.0 |      |   vmulpd    (%r12,%rax), %ymm1, %ymm0
     172 |             |      | 0.50        | 0.50        | 1.00 |      |      |      ||  5.0 |      |   vmovapd   %ymm0, 0(%r13,%rax)
     173 | 0.25        | 0.25 |             |             |      | 0.25 | 0.25 |      ||      |  1.0 |   addq      $32, %rax
     174 | 0.00        | 0.00 |             |             |      | 0.50 | 0.50 |      ||      |      |   cmpq      %rax, %r14
     175 |             |      |             |             |      |      |      |      ||      |      | * jne       .L22

           0.75          0.75   1.00   0.50   1.00   0.50   1.00   0.75   0.75           13.0   1.0


    Loop-Carried Dependencies Analysis Report
    -----------------------------------------
     173 |  1.0 | addq      $32, %rax                      | [173]

It shows the whole kernel together with the optimized port pressure of each instruction form and the overall port binding. Furthermore, in the two columns on the right, the critical path (CP) and the longest loop-carried dependency (LCD) of the loop kernel. In the bottom, all loop-carried dependencies are shown, each with a list of line numbers being part of this dependency chain on the right.

You can find more (already marked) examples and sample outputs for various architectures in the examples directory.

Citations

If you use OSACA for scientific work you can cite us as (for the Bibtex, see the Wiki):

Credits

Implementation: Jan Laukemann, Julian Hammer

License

AGPL-3.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.7.1

Sep 8, 2025

0.7.0

Mar 17, 2025

0.6.1

Oct 11, 2024

0.6.0

Sep 16, 2024

0.5.3

Dec 12, 2023

0.5.2

Aug 18, 2023

0.5.1

Aug 2, 2023

0.5.0

Mar 24, 2023

0.4.13

Feb 15, 2023

0.4.12

Oct 11, 2022

0.4.11

Sep 28, 2022

0.4.10

Sep 8, 2022

0.4.9

Aug 29, 2022

0.4.8

Apr 8, 2022

0.4.7

Nov 4, 2021

0.4.6

Oct 7, 2021

0.4.5

Jul 21, 2021

0.4.4

May 31, 2021

0.4.3

May 10, 2021

0.4.2

May 5, 2021

0.4.1

Apr 19, 2021

0.4.0

Apr 15, 2021

0.3.14

Dec 11, 2020

0.3.13

Nov 23, 2020

0.3.12

Nov 11, 2020

0.3.11

Nov 6, 2020

0.3.10

Nov 2, 2020

0.3.9

Oct 29, 2020

0.3.8

Oct 20, 2020

0.3.7

Oct 20, 2020

0.3.6

Aug 5, 2020

0.3.4

Aug 3, 2020

0.3.3.dev0 pre-release

Mar 16, 2020

0.3.2

Mar 10, 2020

0.3.2.dev5 pre-release

Jan 31, 2020

0.3.2.dev4 pre-release

Jan 28, 2020

0.3.2.dev3 pre-release

Jan 22, 2020

0.3.2.dev2 pre-release

Jan 8, 2020

0.3.2.dev1 pre-release

Dec 16, 2019

0.3.1

Nov 18, 2019

0.3.1.dev1 pre-release

Oct 16, 2019

0.3.1.dev0 pre-release

Oct 4, 2019

0.3.0.dev0 pre-release

Sep 27, 2019

0.2.2

May 16, 2019

0.2.1

Jan 10, 2019

0.2.0

Sep 3, 2018

0.1

Jan 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osaca-0.7.1.tar.gz (3.5 MB view details)

Uploaded Sep 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osaca-0.7.1-py3-none-any.whl (3.7 MB view details)

Uploaded Sep 8, 2025 Python 3

File details

Details for the file osaca-0.7.1.tar.gz.

File metadata

Download URL: osaca-0.7.1.tar.gz
Upload date: Sep 8, 2025
Size: 3.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osaca-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`9cd75013d416757f1ecbc9d64f9b45a8fe07ed3c1a649c6ab5d365e845dab901`
MD5	`2f82bc9629cc6d6c8d9f9a21da40945b`
BLAKE2b-256	`ac2ce495aba0ddbcaab3b5b4855f93913881b264f59e36363929849f29bfc365`

See more details on using hashes here.

File details

Details for the file osaca-0.7.1-py3-none-any.whl.

File metadata

Download URL: osaca-0.7.1-py3-none-any.whl
Upload date: Sep 8, 2025
Size: 3.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osaca-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd238e84e4f4479d8404a47ff1860c672e99c506550e584b1acbd7e91e315c6d`
MD5	`83f1f39c9755b5b719b6e91f4ca0614e`
BLAKE2b-256	`1fe91df4893f29dd3082d47f02a1e93ceabd29f3a694e48b54c213c7548d3a61`

See more details on using hashes here.

osaca 0.7.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OSACA

Getting started

Installation

Dependencies:

Design

Usage

Supported microarchitectures

Throughput & Latency analysis

Marker insertion

x86 markers

AArch64 markers

Benchmark import

Database check

Examples

Citations

Credits

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes