Researchers make advances in field of embodied intelligent hardware
School of Microelectronics | 04/29/2025

In recent years, the development of embodied intelligence has attracted widespread attention. It is crucial to develop high-energy-efficiency perception-computation integrated large model edge-side inference chips to achieve the local deployment of multimodal large models.

Professor Hao Yu’s research team from the School of Microelectronics (SME) at the Southern University of Science and Technology (SUSTech) has made advances in embodied intelligence for large language and vision models. Their breakthroughs in embodied intelligent chips, embodied intelligent accelerator cards, and embodied intelligent systems provide core AI chip technologies to advance China’s transition from information intelligence to physical and biological intelligence. They have made a series of research advancements and published three papers in international journals, including the Journal of Solid-State Circuits, IEEE Transactions on Circuits and Systems I: Regular Papers, and IEEE Integrated Circuits and Systems.

Embodied intelligent chips

As the era of large models unfolds, the computational demands of deep neural networks are growing exponentially. Traditional chip architectures, however, face dual bottlenecks in energy efficiency and performance. To overcome these challenges, the researchers explored a cube-systolic architecture optimized for embodied deep-learning networks. Through network search optimization, the architecture achieved a state-of-the-art energy efficiency of 29.12 TOPS/W and an area efficiency of 7.94 TOPS/mm². These breakthroughs were made possible by addressing the “energy-area-flexibility” trilemma in traditional AI chip design through three key innovations: dynamic precision adjustment, structured sparse encoding, and a vectorized systolic array.

The dynamic precision adjustment balances energy efficiency and accuracy by dynamically optimizing computational precision, while the structured sparse encoding improves compression rates by 30% without sacrificing model accuracy. The vectorized systolic array, an innovative architecture, boosts memory bandwidth utilization to 92%, drastically reducing data movement energy. 

Their research findings, entitled “A 29.12 TOPS/W Vector Systolic Accelerator with NAS-optimized DNNs in 28-nm CMOS”, have been accepted in the top-tier journal IEEE Journal of Solid-State Circuits.

Figure 1. Hybrid-precision accelerator chip and hybrid-sparsity accelerator chip

Figure 2. Performance comparison of the cube-systolic architecture chip with hybrid precision and sparsity

Ph.D. student Kai Li from SME at SUSTech is the first author of this paper. Professor Hao Yu is the corresponding author, and SUSTech is the first affiliation.


Embodied intelligent accelerator cards 

The team designed an edge-side large model inference card based on the cube-systolic architecture, enabling efficient deployment of large language models (LLMs) at the edge with 75% bandwidth utilization (75 tps) and further overcame challenges in deploying LLMs on resource-constrained edge devices Multiple 7B-parameter LLMs and multimodal models were successfully deployed on the system. Compared to GPUs, the system achieves 1.91 times higher throughput and 7.55 times better energy efficiency. It also outperforms the state-of-the-art FPGA accelerator, FlightLLM, by 10–24% in overall performance. 

Their research findings, entitled “EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models”, have been published in IEEE Transactions on Circuits and Systems I: Regular Papers

Figure 3. Embodied intelligent accelerator cards

Figure 4. Comparison of embodied intelligent accelerator cards

Master’s student Ao Shen from SME at SUSTech and Researcher Mingqiang Huang from the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (SIAT, CAS) are the co-first authors of this paper. Professor Hao Yu is the corresponding author, and SUSTech is the first affiliation.


Embodied intelligent system

In collaboration with Genesense Technology Inc., the researchers developed the emGene LLM-embodied NGS sequencer, which deploys optimized LLMs on edge-side accelerator cards. This enables real-time, on-site DNA analysis, revolutionizing intelligent genetic diagnosis in healthcare. 

Their research findings, entitled “emGene: An Embodied LLM NGS Sequencer for Real-time Precision Diagnostics”, have been published in IEEE Integrated Circuits and Systems

Figure 5. emGene LLM-embodied edge NGS sequencer

Professor Hao Yu is the corresponding author of this paper, and SUSTech is the first affiliation.

 

Paper links (In order of appearance above):

IEEE Journal of Solid-State Circuits: https://ieeexplore.ieee.org/document/10972309

IEEE Transactions on Circuits and Systems I: Regular Papers: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10916480

IEEE Integrated Circuits and Systems: https://ieeexplore.ieee.org/document/10930726

 

To read all stories about SUSTech science, subscribe to the monthly SUSTech Newsletter.

2025, 04-29
By School of Microelectronics

From the Series

Research

Proofread ByAdrian Cremin, Yuwen ZENG

Photo ByYan QIU

MORE ›IMAGES

SUSTech Vision
SUSTech Vision | Lychees bring summer sweetness to campus
SUSTech welcomes you to chase your dreams