Home Blog Blog Details

How to implement a multi class neural network with STM32F103?

June 10 2025
Ampheo

Inquiry

Global electronic component supplier AMPHEO PTY LTD: Rich inventory for one-stop shopping. Inquire easily, and receive fast, customized solutions and quotes.

QUICK RFQ
ADD TO RFQ LIST
Implementing a multi-class neural network on an STM32F103 (Cortex-M3, 72 MHz, no FPU, limited RAM/Flash) is challenging due to its limited resources, but it is possible for small networks and low-dimensional inputs (e.g., 2D/3D sensor inputs)

Implementing a multi-class neural network on an STM32F103 (Cortex-M3, 72 MHz, no FPU, limited RAM/Flash) is challenging due to its limited resources, but it is possible for small networks and low-dimensional inputs (e.g., 2D/3D sensor inputs).

How to implement a multi class neural network with STM32F103?

Here's how you can do it step-by-step:


 1. Design a Small Neural Network on PC

Use Python with TensorFlow/Keras to design and train a model:

python
 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import numpy as np

# Example dataset
X = np.array([[0.1, 0.2], [0.9, 0.8], [0.2, 0.9], [0.8, 0.1]])
y = to_categorical([0, 1, 2, 0], num_classes=3)

# Small NN
model = Sequential([
    Dense(6, input_shape=(2,), activation='relu'),
    Dense(3, activation='softmax')  # 3 classes
])

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X, y, epochs=200)


 2. Convert Model to C (Quantized if Needed)

Options:

  • STM32Cube.AI (official ST tool)

    • Converts Keras/TensorFlow models to C code optimized for STM32

    • Integrates with STM32CubeMX

  • uTensor, CMSIS-NN, or TensorFlow Lite for Microcontrollers (TFLM)

    • Lightweight inference engines for ARM Cortex-M cores

 On STM32F103, use integer quantization (8-bit) to reduce memory and increase speed:

bash
 
# Quantize model (TFLite)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

 3. Integrate with STM32 Project

A. Using STM32Cube.AI:

  1. Install STM32CubeMX + STM32Cube.AI plugin

  2. Import .h5 model

  3. Enable X-CUBE-AI middleware

  4. Generate code

  5. Use aiRun() to pass input data and get output class

B. Manual (for tiny models):

If doing it manually:

  • Extract weights and biases

  • Implement forward pass in C:

c
 

float relu(float x) {
    return x > 0 ? x : 0;
}

void forward(float* input, float* output) {
    // Layer 1 (2 inputs → 6 neurons)
    float hidden[6];
    for (int i = 0; i < 6; ++i) {
        hidden[i] = relu(input[0]*w1[i][0] + input[1]*w1[i][1] + b1[i]);
    }

    // Output layer (6 → 3)
    for (int i = 0; i < 3; ++i) {
        output[i] = hidden[0]*w2[i][0] + ... + b2[i];
    }
    softmax(output, 3);
}


 4. Run and Test

  • Flash to STM32 using STM32CubeIDE

  • Send input via UART or use ADC/sensors

  • Print the predicted class


 Optimization Tips

  • Use fixed-point arithmetic (Q7, Q15 types)

  • Use CMSIS-DSP for matrix operations

  • Use CMSIS-NN if available — optimized for Cortex-M

  • Keep model size tiny: e.g., 2–1–3 or 2–4–3 architecture


 Resources

Ampheo