Edge AI & Ultra Low Power: The Hardware Architecture of On-Device Intelligence

Close-up of a modern computer chip labeled “AI” mounted on an electronic circuit board. Glowing blue and orange circuitry and light effects symbolize artificial intelligence, data processing, and digital technology.

Edge AI shifts AI inference away from distributed cloud solutions directly to the source of the data in microcontrollers, sensors, and on-site machines. This reduces data processing latency and the volume of data to be transmitted. At the same time, however, the demands on hardware architecture, the memory hierarchy, and energy management within the devices are increasing. NPU microcontrollers, RISC-V coprocessors, and model compression provide the technical foundation to meet these requirements.

Why AI inferences are moving from the cloud to the edge

Edge AI shifts AI inferences from the cloud directly to on-site devices, such as sensors, microcontroller units (MCUs), or machines—a concept known as “on-device intelligence.” Edge AI reduces data transmission latency because the data does not have to travel long distances to distributed cloud infrastructures. At the same time, less raw data needs to be transmitted overall, since the device pre-filters the data on-site rather than sending it to the cloud first.

Furthermore, decisions can be made even with unstable connectivity. For example, an anomaly can be detected directly in the device and the appropriate action triggered. Additionally, sensitive data can be processed locally, which simplifies compliance with data protection requirements and reduces the attack surface during data transmission.

How Specialized NPU MCUs Increase Performance Density

A key driver of edge AI hardware is NPU-MCUs: microcontroller units that combine traditional control functions with dedicated neural processing units (NPUs), thereby increasing the performance density of . For example, Arm explicitly positions the Ethos-U55 NPU for machine learning (ML) inference in space- and power-constrained embedded and IoT devices to implement cost- and performance-efficient AI applications. When combined with Cortex-M55 cores, Arm claims up to a 480-fold increase in ML performance compared to previous Cortex-M systems.

With the RA8P1 family, Renesas demonstrates how to translate Arm architectures into concrete MCU products. A 250 MHz Cortex-M85 core, an Ethos-U55 NPU with 256 Giga Operations per Second (GOPS) at 500 MHz, several MB of Flash/SRAM memory, and camera, audio, and video interfaces address vision and voice AI applications directly at the microcontroller level.

The advantage of an NPU-MCU lies not only in performing more computations per second. Equally important is how efficiently data is processed. This is because in small embedded systems, it often costs more energy to move data between memory, the processor, and the accelerator than to perform the actual AI computation. For this reason, the model, memory, and NPU must work together as closely as possible. Key factors here include:

Short data paths between memory and accelerator
Direct memory access to transfer data efficiently
Tiling, i.e., processing in small data blocks
Compact number formats such as INT8
Caching frequently used model weights locally to minimize latency and power consumption

What are RISC-V coprocessors, and what are their advantages?

Alongside NPU-MCUs, RISC-V coprocessors are gaining importance because they offer a high degree of architectural freedom in embedded AI designs. While standard MCUs impose fixed instruction sets and peripheral blocks, RISC-V allows for a combination of a base core, vector extensions, and domain-specific AI accelerators.

However, matrix multiplication isn’t the only factor that matters for AI workloads. Activations, normalizations, reductions, data reshaping, and fallback operators also impact throughput. A recent paper on embedded RISC-V SoCs with vector support therefore emphasizes that the RISC-V Vector Extension and suitable auto-vectorization tools are essential for integration into deep learning (DL) deployments.

However, the open architecture is not a surefire success. For developers and system architects, the complexity simply shifts to the toolchain, compiler, runtime, and verification. A RISC-V coprocessor can be more application-specific than a generic AI accelerator, but it must be seamlessly integrated into memory access, interrupt models, power domains, and software abstraction. Developers should therefore not rely solely on Tera Operations per Second (TOPS) or GOPS values, but also keep an eye on benchmarks for latency, power consumption, and accuracy.

How Model Compression Enables TinyML

This is where the technique of “model compression” comes into play. It is a crucial technique for TinyML to make large DL models runnable on resource-constrained MCUs and edge devices. Here, models are compressed so that they can be used with limited memory and power budgets.

Using quantization, the numerical precision of model parameters is reduced—typically from 32-bit floating-point to smaller integer formats—which reduces model size and computational overhead. For example, the TensorFlow Model Optimization Toolkit specifically addresses this deployment path and uses quantization and pruning for sparse weights as supported techniques

How Sleep Modes Improve Energy Efficiency

Another lever lies in power management. Ultra-low-power edge devices do not operate continuously but are event-driven: sensor front-ends or always-on domains remain active, while the main CPU core and NPU remain in sleep mode. Only when a sensor detects a relevant event—such as motion, sound, vibration, or a threshold value—are the processor, NPU, or other computing units activated. To achieve this, the hardware requires:

Short wake-up times
Memory areas that retain data in sleep mode
The ability to disable unused circuit components
Always-on domains for simple sensor monitoring

Espressif explains, for example, that in ESP32 systems, Light Sleep mode reduces the clock speed and power consumption of CPUs, RAM, and digital peripherals, and these components retain their state after waking up. In Deep Sleep mode, CPUs, large portions of RAM, and digital peripherals are shut down; only selected low-power blocks remain active.

What decisions need to be made now to ensure that Edge AI doesn’t exceed the energy budget?

Edge AI brings AI functions directly to sensors, microcontrollers, and embedded systems. It is not only crucial that the models function; they must also operate with limited memory, limited computing power, and very little energy. This makes energy efficiency a central prerequisite for successful on-device intelligence.

Those who plan for Edge AI early on with the right hardware design, compressed models, and intelligent sleep modes can extend battery life and reduce maintenance costs. Developers who address these issues too late risk a shorter product lifespan, higher costs, and costly redesigns.

Experience Edge AI live at electronica

Leading companies in the industry will demonstrate at electronica 2026 how Edge AI and ultra-low-power architectures can be implemented in practice and which hardware, software, and system solutions are already available for this purpose. From NPU microcontrollers and RISC-V coprocessors to energy-efficient sensor and embedded platforms, visitors to the trade show can see which approaches are suitable for real-world applications and what decisions developers need to make now.

>> All exhibitors at electronica 2026

Those who wish to delve deeper into topics such as RISC-V, NPU MCUs, or TinyML can gain comprehensive insights at electronica through technical presentations, panel discussions, and expert forums—such as the Embedded Developer Forum or the IIoT Forum—and exchange ideas with leading industry experts.

>> To the forums

In the special edge lab LIVE area, visitors can discuss with exhibitors and experts from the embedded industry which architectural decisions are relevant for edge AI, local data processing, energy management, and lifecycle extension in specific applications.

>> Visit the edge lab LIVE special area

Sources

https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u55

https://www.renesas.com/en/products/ra8p1

https://arxiv.org/html/2507.17771v1

https://github.com/tensorflow/model-optimization

https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/sleep_modes.html