Artificial intelligence (AI) systems require energy-efficient computation at rapid response times. However, deploying neural networks (NNs) on traditional von Neumann architecture creates performance bottlenecks in terms of energy and time costs due to data movement between the processing and memory units (the memory wall).
To address this challenge, compute-in-memory (CIM) has emerged as a promising architectural paradigm that conducts computation directly within memory arrays, significantly reducing data transfer overhead. The concept has gained widespread attention as a pathway toward next-generation AI hardware. Among various CIM strategies, non-volatile compute-in-memory (nvCIM) stands out for its ability to retain data without power. However, most existing nvCIM implementations rely on analogue computation, which, despite its energy efficiency, poses challenges in terms of precision, scalability, and robustness.

Assistant Professor Longyang Lin’s research team from the School of Microelectronics at the Southern University of Science and Technology (SUSTech), in collaboration with Xi’an Jiaotong University, has made significant progress in the development of nvCIM chips. The research team proposed and successfully fabricated the first lossless and fully parallel non-volatile digital compute-in-memory (nvDCIM) chip, systematically overcoming significant limitations of current analogue nvCIM architectures in terms of computational accuracy, scalability, and robustness.
Their study, titled “A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips,” was published in the prestigious journal Nature Electronics.
Since mainstream analogue nvCIM architectures suffer from limited precision, high sensitivity to process-voltage-temperature (PVT) variations, and poor scalability, they are unable to meet the demands of high-precision applications such as physics-informed neural networks (PINNs). Additionally, analogue-to-digital and digital-to-analogue converters (ADCs/DACs) in advanced process nodes face degraded precision and increased area and power overhead. These limitations hinder the adoption of analogue nvCIM in AI for science applications that require both high precision and reliability.

Figure 1. Motivation and overview of the nvDCIM macro
To overcome these bottlenecks, the team developed a 64-kb nvDCIM chip based on 40-nm CMOS and foundry STT-MRAM technologies (Figure 1), introducing a series of innovations at bitcell, macro, and algorithm levels.

Figure 2. Overview of the IBMD bitcell
At the bitcell level, the researchers proposed and implemented a novel In-Bitcell Multiplication and Digitization (IBMD) design (Figure 2), which produces a digital output equivalent to the AND logic operation between the digital input and the analogue non-volatile weight. This digital logic operation eliminates the need for analogue components such as ADCs and DACs, significantly improving the system’s robustness and scalability.

Figure 3. Overall architecture of the nvDCIM macro
At the macro level, the chip integrated a fully parallel adder tree and a precision-reconfigurable accumulator (Figure 3), supporting precision configurations of 4, 8, 12, or 16 bits. This design achieves fully parallel and lossless matrix-vector multiplication (MVM), delivering high throughput without compromising accuracy.

Figure 4. Toggle-rate-aware training scheme
At the algorithm level, they proposed and implemented a toggle-rate-aware training scheme (Figure 4), which incorporates the bit toggle rate of input signals into the loss function of neural networks as a regularization term. This co-optimization of software and hardware effectively reduces dynamic power consumption during inference while maintaining task accuracy.
The study demonstrates the comprehensive potential of the nvDCIM architecture in enabling high-throughput, energy-efficient, and lossless digital computing. The IBMD design not only achieves high-speed read operations of STT-MRAM but also shows potential for extension to other non-volatile memory technologies, broadening the application scope of digital compute-in-memory architectures.
In future work, the research team aims to further optimize circuit design and chip architecture, scale up to larger memory capacities, and explore integration into complex AI systems. Through software-hardware co-optimization strategies, this technology is expected to help overcome the “memory wall” and lay a solid foundation for the efficient deployment of intelligent systems at the edge and in the cloud.
Ph.D. student Humiao Li from Longyang Lin’s team is the first author of the paper. Assistant Professor Longyang Lin and Professor Tai Min from Xi’an Jiaotong University are the corresponding authors.
Paper link: https://www.nature.com/articles/s41928-025-01479-y
To read all stories about SUSTech science, subscribe to the monthly SUSTech Newsletter.
Proofread ByAdrian Cremin, Yifei REN
Photo BySchool of Microelectronics