Data Centric Computing in Emerging Technologies: A PCM-CMOS Hardware Accelerator
Speaker: Jing Li, IBM T. J. Watson Research Center
Department: Electrical Engineering
Location: Engineering Quadrangle B205
Date/Time: Wednesday, October 16, 2013, 12:30 p.m. - 1:30 p.m.
The confluence of disruptive technologies beyond CMOS and "Big Data" workloads calls for a fundamental paradigm shift from homogenous compute-centric system which was designed for handling structured data to new heterogeneous data-centric system which can effectively store/process a large set of semi-structured or unstructured data for better innovation, competition and productivity. In a heterogeneous system, silicon CMOS (e.g., multi-core CPU) will continue to play a major role in primary computing and essential bookkeeping while the tasks that are either difficult, expensive, or even unachievable with standard CMOS within a fixed power/cost budget can be effectively offloaded to hardware engines built with other technologies. By harnessing the potential of new technologies, we can enable efficient data-centric computing by building cost-effective heterogeneous hardware substrate with significantly enhanced energy efficiency, performance, throughput and scalability.
With the objective of rethinking data-centric system design from ground up, I will present a PCM-CMOS hardware accelerator based on the concept of ternary content addressable memory (TCAM) using emerging memory technology i.e., phase change memory (PCM). In particular, a fully-functional heterogeneous chip was designed and fabricated for the first time, achieving >10x cell area reduction compared to homogenous CMOS-based at the same technology node. The accelerator distributes compute units within storage elements in a cost-effective way, providing fine-grained control and high bandwidth close to data sources to avoid communication cost. It is particularly efficient in performing search operation with high and deterministic lookup rate. It can also be used as either a monolithic compute unit to perform direct data-flow computation or a monolithic storage media as storage class memory. Thus, it is an attractive solution for a wide range of data-intensive applications e.g., genome matching in bioinformatics, intrusion detection in cloud computing, etc. It is particularly useful for applications with real-time response demand (e.g., real-time pattern reorganization for national security) that pure software-based approaches cannot meet. In spite of tremendous advantages in performance/cost/energy, design with heterogeneous PCM/CMOS technologies poses new challenges during hardware implementation due to the severely degraded operating margin introduced inherently by technology itself. To address these challenges, in the talk, I will present two enabling techniques: 1) a clocked self-referenced sensing scheme and 2) a two-bit encoding. With these techniques, the fabricated chip can reliably operate at very low voltage (750mV). Finally, I will briefly present two critical techniques to move further into a more cost-effective design based on variable-bit storage.