Aidge export for ARM CortexM systems
Project description
Aidge Export for ARM CortexM systems
This plugin allow Aidge to create exports for ARM CortexM systems.
It generates standalone C/C++ code intended to run on STM32 targets.
pip install aidge_export_arm_cortexm
Supported targets
- STM32F746
- STM32H743
- STM32L4R5
Additional targets can be added upon request.
Backends
This export provides two backends:
-
arm_cortexm: Provides specific optimizations for Convolution and Fully Connected (FC) layers, and is largely based on theaidge_export_cppbackend. -
CMSIS-NN: Relies on the optimized kernels provided by CMSIS-NN.
Supported layers
arm_cortexm backend
| Layer | Supported |
|---|---|
| Convolution | ✔️ |
| Depthwise Convolution | ✔️ |
| Fully Connected | ✔️ |
CMSIS-NN backend
| Layer | Supported |
|---|---|
| Add | ✔️ |
| Mul | ❌ |
| Convolution | ✔️ |
| Depthwise Convolution | ✔️ |
| Fully Connected | ✔️ |
| Average Pooling | ✔️ |
| Max Pooling | ✔️ |
| Global Average Pooling | ✔️ |
| Concat | ❌ |
| Reshape | ❌ |
| Softmax | ❌ |
[!NOTE]:
- If your model contains unsupported operators, the export will still work. In that case, the implementation from the
aidge_export_cppmodule will be used.- The CMSIS-NN backend only supports quantized models (int8).
Examples
Several example scripts are available in aidge_export_arm_cortexm/examples.
| Model | arm_cortexm | CMSIS-NN | Notes |
|---|---|---|---|
| LeNet | ✔️ | ✔️ | |
| Deep Autoencoder | ✔️ | 🔶 | Generation works, but slight output differences are observed on the last FC layer |
| DS-CNN | ✔️ | ✔️ | |
| MobileNetV1 VWW | ✔️ | ✔️ | |
| ResNet8 | ✔️ | 🔶 | Generation works, but slight output differences are observed on the third convolution |
Running an example
Navigate to the desired example directory:
cd aidge_export_arm_cortexm/examples/export_LeNet
Run the Python script:
python lenet.py --board stm32h7 --dtype int8 --cmsis --aidge_cmp -v
Common options
--dtype <type>: Change the export data type (enables quantization)--cmsis: Use the CMSIS-NN backend when possible--mock_db: Use random inputs--aidge_cmp: Compare layer outputs at runtime with reference results fromaidge_backend_cpu--no_cuda: Disable CUDA usage--board <board>: Select the target board--dev_mode: Generate symbolic links to the export module files instead of copying them--mem_wrap: Enable memory wrapping (not compatible withAidge_ArmandCMSISbackends)-vvv: Enable verbose output (the number ofvcontrols the verbosity level)
Run python <model>.py --help to see all available options.
Navigate to the generated export directory:
cd export_lenet_h7_int8
Compile the project:
make clean; make build AIDGE_CMP=true
The AIDGE_CMP argument is optional.
The generated binary can be found at bin/aidge_stm32.elf.
You can flash this binary onto your board using tools such as STM32CubeProgrammer. A serial console (e.g., PuTTY) can be used to view runtime logs.
Benchmark Export_arm_cortexm - STM32H7
This project allows automatic benchmarking on an STM32H7xxx target, using exports generated with the aidge_export_arm_cortexm backend.
Installation
Prerequisite:
pyocd >= 0.35.0pyserial >= 3.5
pip install aidge_export_arm_cortexm
🛠 Build from Source
Prerequisite (in addition to previous one):
- Please review the global installation instructions before proceeding.
- If using a virtual environment, use the same one for all Aidge modules.
1. Python installation using setup scripts
| Environment | Python Development |
|---|---|
| Windows | .\setup.ps1 -Modules backend_cpu -Clean -Tests -Python |
| Unix | ./setup.sh -m backend_cpu --clean --tests --python |
[!TIP] Use
Get-Help setup.ps1(Win) or./setup.sh -h(Unix) for full documentation.
2. Python Installation using pip
Run these commands from the aidge_export_arm_cortexm/ directory:
#fStandard install
pip install . -v
# Install with testing dependencies
pip install .[test] -v && pytest
Manual update of the STM32 pack is required
By default, pyOCD does not include all STM32 packs. The pack corresponding to NUCLEO-H743ZI (stm32h743zitx) must be installed manually:
pyocd pack install stm32h743zitx
This operation can take several minutes.
Verify that the board is correctly detected
If you are on Windows, make sure you installed the ST-LINK USB Driver that you can find on ST website.
Then connect your board via USB and run:
pyocd list
Expected output example:
# Probe/Board Unique ID Target
------------------------------------------------------------------
0 STM32 STLink 066DFF343339415043185830 ✔︎ stm32h743zitx
NUCLEO-H743ZI
If you see a green check ✔︎, the board is properly detected.
If you see a red cross x, manually install the pack as described above.
A permissions issue with PyOCD: "No available debug probes are connected"
If running the following command results in an error:
pyocd list
No available debug probes are connected
but your STM32 device is visible via lsusb, this may be due to missing USB permissions.
Follow these steps to fix the issue :
-
Create a new udev rule:
sudo nano /etc/udev/rules.d/50-st-link.rules -
Paste this content:
SUBSYSTEM=="usb", ATTR{idVendor}=="0483", ATTR{idProduct}=="374b", MODE="0666" -
Reload udev and trigger:
sudo udevadm control --reload-rulessudo udevadm trigger -
Unplug and replug your STM32 device.
-
Try again:
pyocd listYou should now see your board listed.
Using the benchmark
From the aidge_core/benchmark directory, you can run benchmarks with the following commands:
Compare with ONNXRuntime (compute_output)
aidge_benchmark --config-file ./operator_config/relu_config.json --modules aidge_export_arm_cortexm --results-directory results
Inference time measurement (measure_inference_time)
aidge_benchmark --config-file ./operator_config/relu_config.json --modules aidge_export_arm_cortexm --results-directory results --nb-iteration 20 -t
Important notes
Serial Port: "Permission denied: '/dev/ttyACM0'"
If you see an error like this when trying to flash :
Error connecting to serial port: [Errno 13] could not open port /dev/ttyACM0: [Errno 13] Permission denied: '/dev/ttyACM0'
This usually means your user doesn't have the right permissions for serial access.
For fix that you have to add your user to the dialout group :
sudo usermod -a -G dialout $USER
Then Restart your terminal for the change to take effect
Capture timeout and longer UART output
- When measuring inference time using multiple forward calls, capture times may increase. To avoid premature interruption of the capture process, it is important to increase
uart_capture_durationin theboard_config.jsonaccordingly (e.g., from 30s to 60 or more),
Retrying flash in case of UART failure
-
The flashing process now includes a retry mechanism:
if the UART output file is missing or empty, the firmware is reflashed up to 5 times by default (this can be changed via theMAX_RETRIESconstant in the code). -
This improves robustness against rare flashing issues caused by the
pyOCDlibrary, where firmware may not start correctly despite successful flashing. -
A special end keyword (default:
"DEMO END") is now expected in the UART output to determine when inference is complete and to stop UART capture. -
The file
uart_output.txtis automatically generated during execution and placed in theexport_folder. -
An
export_log.logfile is generated at compilation to store the build logs. -
The
board_config.jsonfile is essential for configuring board flashing. When testing dimensions like[16], you must increaseuart_capture_durationto at least60or more.
Limitations
1. Memory limitations (RAM / Flash)
- From dimensions like
[32, 32, 32, 32](e.g., for ReLU), compilation errors or RAM/Flash overflows may occur. - It is recommended to stay within maximum dimensions of
16, such as[1,1,1,1],[4,4,4,4], or[16,16,16,16].
2. Instability during consecutive flashes
-
If the firmware does not start correctly or no UART output is captured, the system retries flashing.
-
You can modify the number of attempts by adjusting the
MAX_RETRIESconstant in the code. A known issue exists when running the program on the STM32 board multiple times in sequence or with a large.elffile. -
When running multiple benchmarks consecutively, the STM32 may not execute the firmware correctly, even if flashing appears successful.
-
The UART remains silent (
uart_output.txtis empty), requiring several attempts until the UART finally outputs expected values.
Recommendations
- Avoid running benchmarks consecutively in the same session.
- Separate different tensor dimensions into different JSON config files.
- Running benchmarks individually helps reduce flashing failures.
- Increase
uart_capture_durationwhen working with large output tensors.
License
Aidge has a Eclipse Public License 2.0, as found in the LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aidge_export_arm_cortexm-0.9.0.post3-py3-none-any.whl.
File metadata
- Download URL: aidge_export_arm_cortexm-0.9.0.post3-py3-none-any.whl
- Upload date:
- Size: 3.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73cec951971551db77c8a209bbcc31b63be0d502c3a456551fd6a6095233ec58
|
|
| MD5 |
480a31b20792d10d8bb401149b2bc8e2
|
|
| BLAKE2b-256 |
6f9df199183f821b984cf448b4146dbd8b37934ba6c4af645b8e96f2c988ba9e
|