Skip to main content

Aidge export for ARM CortexM systems

Project description

Aidge Export for ARM CortexM systems

This plugin is to add in your Aidge environment to create exports for ARM CortexM systems.

Description

This export generates standalone C/C++ code intended to run on STM32 targets.

Supported targets

  • STM32F746
  • STM32H743
  • STM32L4R5

Additional targets can be added upon request.

Backends

This export provides two backends:

  • arm_cortexm:
    Provides specific optimizations for Convolution and Fully Connected (FC) layers, and is largely based on the aidge_export_cpp backend.

  • CMSIS-NN:
    Relies on the optimized kernels provided by CMSIS-NN.

Supported layers

arm_cortexm backend

Layer Supported
Convolution ✔️
Depthwise Convolution ✔️
Fully Connected ✔️

CMSIS-NN backend

Layer Supported
Add ✔️
Mul
Convolution ✔️
Depthwise Convolution ✔️
Fully Connected ✔️
Average Pooling ✔️
Max Pooling ✔️
Global Average Pooling ✔️
Concat
Reshape
Softmax

Note: If your model contains unsupported operators, the export will still work. In that case, the implementation from the aidge_export_cpp module will be used.
Note: The CMSIS-NN backend only supports quantized models (int8).

Examples

Several example scripts are available in aidge_export_arm_cortexm/examples.

Model arm_cortexm CMSIS-NN Notes
LeNet ✔️ ✔️
Deep Autoencoder ✔️ 🔶 Generation works, but slight output differences are observed on the last FC layer
DS-CNN ✔️ ✔️
MobileNetV1 VWW ✔️ ✔️
ResNet8 ✔️ 🔶 Generation works, but slight output differences are observed on the third convolution

Running an example

Navigate to the desired example directory:

cd aidge_export_arm_cortexm/examples/export_LeNet

Run the Python script:

python lenet.py --board stm32h7 --dtype int8 --cmsis --aidge_cmp -v

Common options

  • --dtype <type>: Change the export data type (enables quantization)
  • --cmsis: Use the CMSIS-NN backend when possible
  • --mock_db: Use random inputs
  • --aidge_cmp: Compare layer outputs at runtime with reference results from aidge_backend_cpu
  • --no_cuda: Disable CUDA usage
  • --board <board>: Select the target board
  • --dev_mode: Generate symbolic links to the export module files instead of copying them
  • --mem_wrap: Enable memory wrapping (not compatible with Aidge_Arm and CMSIS backends)
  • -vvv: Enable verbose output (the number of v controls the verbosity level)

Run python <model>.py --help to see all available options.

Navigate to the generated export directory:

cd export_lenet_h7_int8

Compile the project:

make clean; make build AIDGE_CMP=true

The AIDGE_CMP argument is optional.

The generated binary can be found at bin/aidge_stm32.elf.
You can flash this binary onto your board using tools such as STM32CubeProgrammer. A serial console (e.g., PuTTY) can be used to view runtime logs.

Installation

From Source

To install the export manager from the gitlab repository, run these commands in the Python environment where aidge is already installed.

git clone https://gitlab.eclipse.org/eclipse/aidge/aidge_export_arm_cortexm.git
cd aidge_export_arm_cortexm
pip install .

Benchmark Export_arm_cortexm - STM32H7

This project allows automatic benchmarking on an STM32H7xxx target, using exports generated with the aidge_export_arm_cortexm backend.


Installation

Project Requirements

The following packages are required and have been added in the pyproject.toml file:

  • pyocd >= 0.35.0
  • pyserial >= 3.5

Manual update of the STM32 pack is required

By default, pyOCD does not include all STM32 packs. The pack corresponding to NUCLEO-H743ZI (stm32h743zitx) must be installed manually:

pyocd pack install stm32h743zitx

This operation can take several minutes.

Verify that the board is correctly detected

If you are on Windows, make sure you installed the ST-LINK USB Driver that you can find on ST website.

Then connect your board via USB and run:

pyocd list

Expected output example:

  #   Probe/Board     Unique ID                  Target 
------------------------------------------------------------------
  0   STM32 STLink    066DFF343339415043185830   ✔︎ stm32h743zitx 
      NUCLEO-H743ZI

If you see a green check ✔︎, the board is properly detected. If you see a red cross x, manually install the pack as described above.

A permissions issue with PyOCD: "No available debug probes are connected"

If running the following command results in an error:

pyocd list

No available debug probes are connected

but your STM32 device is visible via lsusb, this may be due to missing USB permissions.

Follow these steps to fix the issue :

  • Create a new udev rule:

    sudo nano /etc/udev/rules.d/50-st-link.rules

  • Paste this content:

    SUBSYSTEM=="usb", ATTR{idVendor}=="0483", ATTR{idProduct}=="374b", MODE="0666"

  • Reload udev and trigger:

    sudo udevadm control --reload-rules

    sudo udevadm trigger

  • Unplug and replug your STM32 device.

  • Try again:

    pyocd list

    You should now see your board listed.


Using the benchmark

From the aidge_core/benchmark directory, you can run benchmarks with the following commands:

Compare with ONNXRuntime (compute_output)

aidge_benchmark --config-file ./operator_config/relu_config.json --modules aidge_export_arm_cortexm --results-directory results

Inference time measurement (measure_inference_time)

aidge_benchmark --config-file ./operator_config/relu_config.json --modules aidge_export_arm_cortexm --results-directory results --nb-iteration 20 -t


Important notes

Serial Port: "Permission denied: '/dev/ttyACM0'"

If you see an error like this when trying to flash :

Error connecting to serial port: [Errno 13] could not open port /dev/ttyACM0: [Errno 13] Permission denied: '/dev/ttyACM0'

This usually means your user doesn't have the right permissions for serial access.

For fix that you have to add your user to the dialout group :

sudo usermod -a -G dialout $USER

Then Restart your terminal for the change to take effect

Capture timeout and longer UART output

  • When measuring inference time using multiple forward calls, capture times may increase. To avoid premature interruption of the capture process, it is important to increase uart_capture_duration in the board_config.json accordingly (e.g., from 30s to 60 or more),

Retrying flash in case of UART failure

  • The flashing process now includes a retry mechanism:
    if the UART output file is missing or empty, the firmware is reflashed up to 5 times by default (this can be changed via the MAX_RETRIES constant in the code).

  • This improves robustness against rare flashing issues caused by the pyOCD library, where firmware may not start correctly despite successful flashing.

  • A special end keyword (default: "DEMO END") is now expected in the UART output to determine when inference is complete and to stop UART capture.

  • The file uart_output.txt is automatically generated during execution and placed in the export_folder.

  • An export_log.log file is generated at compilation to store the build logs.

  • The board_config.json file is essential for configuring board flashing. When testing dimensions like [16], you must increase uart_capture_duration to at least 60 or more.


Limitations

1. Memory limitations (RAM / Flash)

  • From dimensions like [32, 32, 32, 32] (e.g., for ReLU), compilation errors or RAM/Flash overflows may occur.
  • It is recommended to stay within maximum dimensions of 16, such as [1,1,1,1], [4,4,4,4], or [16,16,16,16].

2. Instability during consecutive flashes

  • If the firmware does not start correctly or no UART output is captured, the system retries flashing.

  • You can modify the number of attempts by adjusting the MAX_RETRIES constant in the code. A known issue exists when running the program on the STM32 board multiple times in sequence or with a large .elf file.

  • When running multiple benchmarks consecutively, the STM32 may not execute the firmware correctly, even if flashing appears successful.

  • The UART remains silent (uart_output.txt is empty), requiring several attempts until the UART finally outputs expected values.


Recommendations

  • Avoid running benchmarks consecutively in the same session.
  • Separate different tensor dimensions into different JSON config files.
  • Running benchmarks individually helps reduce flashing failures.
  • Increase uart_capture_duration when working with large output tensors.

License

Aidge has a Eclipse Public License 2.0, as found in the LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aidge_export_arm_cortexm-0.9.0-py3-none-any.whl (3.0 MB view details)

Uploaded Python 3

File details

Details for the file aidge_export_arm_cortexm-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aidge_export_arm_cortexm-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3e1349bb64a4785b2036f09dd06eaaf621d1be736059aab45ffd7f619898811
MD5 73fa5a208eff919cc32709313ca8d128
BLAKE2b-256 3d77c57306c870eaf254555b6c5d44de56edde22f49fce52d14751da90606849

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page