A survey on versatile embedded Machine Learning hardware acceleration
25 Jun 2025
Hardware
acceleration;
Machine
Learning;
Multi-application;
Embedded
devices
Abstract
This survey investigates recent developments in versatile embedded Machine Learning (ML) hardware acceleration. Various architectural approaches for efficient implementation of ML algorithms on resource-constrained devices are analyzed, focusing on three key aspects: performance optimization, embedded system considerations (throughput, latency, energy efficiency) and multi-application support. Nevertheless, it does not take into account attacks and defenses of ML architectures themselves. The survey then explores different hardware acceleration strategies, from custom RISC-V instructions to specialized Processing Elements (PEs), Processing-in-Memory (PiM) architectures and co-design approaches. Notable innovations include flexible bit-precision support, reconfigurable PEs, and optimal memory management techniques for reducing weights and (hyper)-parameters movements overhead. Subsequently, these architectures are evaluated based on the aforementioned key aspects. Our analysis shows that relevant and robust embedded ML acceleration requires careful consideration of the trade-offs between computational capability, power consumption, and architecture flexibility, depending on the application.
Citation
Pierre Garreau, Pascal Cotret, Julien Francq, Jean-Christophe Cexus, Loïc Lagadec, A survey on versatile embedded Machine Learning hardware acceleration, Journal of Systems Architecture, Volume 167, 2025, 103501, ISSN 1383-7621, https://doi.org/10.1016/j.sysarc.2025.103501.