Please check out my research group's webpage and my research group's news page for the most up to date information about my research group. For my latest publications, please see my group's publications page. If you are interested in learning about Computer Architecture, please enroll in my Massively Open Online Course in Computer Architecture hosted on Coursera.
We have tested and characterized our data center and Infrastructure as a Service (IaaS) cloud inspired manycore microprocessor the 25-core Princeton Piton Processor which we have open sourced as the OpenPiton Project. In addition, we have released our 4000+ core manycore simulator, the PriME Manycore Simulator.
I am looking for Graduate and Undergraduate students interested in Computer Architecture, Operating Systems, Cloud Computing Systems, Sustainable Computing, and Circuits to join my group.
Short Biography: David Wentzlaff is an Associate Professor at Princeton University in the Electrical Engineering Department. Before joining Princeton, he completed his PhD and MS at MIT and was Lead Architect and Founder of Tilera Corporation, a multicore chip manufacturer now owned by Mellanox. Before Tilera, he was one of the architects of the Raw Processor at MIT and designed the Raw on-chip networks. David founded the MIT Factored Operating System (fos) project which focused on designing scalable operating systems for thousand core multicores and cloud computers. His work has been awarded the NSF CAREER award, the DARPA Young Faculty Award, the AFOSR Young Investigator Prize, and the Princeton E. Lawrence Keyes Faculty Advancement Award. David teaches the world's first Massively Open Online Course (MOOC) in Computer Architecture offered through Coursera. David's current research interests include how to create manycore microprocessors customized specifically for future data centers and Cloud computing environments and how to reduce the impact of computing on the environment by optimizing computer architecture for fully biodegradable substrates. He enjoys hiking and mountaineering when not designing multicore processors.
|( Curriculum Vitae )|
My research group has two main thrusts: First, how can microprocessors be optimized for future data centers and cloud computing. Second, how to create computing systems which are sustainable from a materials perspective and do not create any electronic waste (e-waste).
My group has led a rethinking of processor design, demonstrating that general purpose microprocessors should be architected differently when used in an Infrastructure as a Service (IaaS) cloud setting or used in a large data center. In contrast to current server processors, my group’s processor designs are built to create more efficient economic markets, thereby enabling the cloud provider to multiplicatively increase revenue while simultaneously reducing cloud customer expenditures. Our designs leverage the massive amounts of unexploited commonality available in data centers and our work breaks down artificial boundaries in data center systems. We have demonstrated many of our ideas in the 460 Million transistor, 25-core, Princeton Piton Processor, one of the largest academic microprocessors ever built. We have open sourced Piton as the OpenPiton project, the world’s first open source, general-purpose, multithreaded manycore processor. OpenPiton demonstrates our commitment to and leadership in the hot field of open source hardware. We have also open sourced our 4000+ core manycore parallelized simulator (PriME).
Beyond data center processors, my group has been working to create fully biodegradable (organic semiconductor) microprocessors to stop the growing pollution caused by discarded electronics (e-waste). We wrote the first paper [MICRO 2017] to investigate optimizing computer architectures for organic semiconductors. In this work, we designed and fabricated a complete logic standard cell library for organic thin film transistors and have fabricated many organic transistor designs at Princeton. We are expanding this work to studying how other emerging semiconductor technologies that provide qualitatively new capabilities affect computer architecture design.
Princeton Piton Processor
|A critical component of my research is evaluating my research ideas in novel prototype systems. My group has designed, built, and successfully tested one of the world’s largest academic microprocessors, the 25-core Princeton Piton Processor. Piton, contains over 460 Million transistors in IBM’s 32nm SOI process, runs full stack Debian Linux, contains custom-built distributed cache coherence which extends off-chip enabling many Piton chips to be connected allowing for data center-scale systems, and is built as a best-of-breed, 64-bit, modern manycore system that utilizes three Networks on Chip (NoCs) along with many other features. Piton was built as a platform to evaluate my research group’s data center and IaaS cloud architecture ideas at scale and at speed. Piton is a prototype platform for Execution Drafting, Coherence Domain Restriction, Memory Inter-arrival Time Traffic Shaping, and other research ideas. We announced Piton at [HotChips] and have further discussed it in [IEEE Micro].||
(Piton project page)
OpenPiton Configurable Manycore
|Beyond prototyping for our own research goals, I feel that it is important to have external impact and enable other researchers. Using Piton as a base, we extended Piton and parameterized it as the OpenPiton project. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton is configurable from 1 to 500 million cores, is easily implemented in modern FPGAs, provides a complete verification infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and is written in industry standard Verilog. OpenPiton allows other researchers to go beyond simulation by including FPGA synthesis scripts as well as ASIC backend scripts. We published OpenPiton at [ASPLOS]. We actively support OpenPiton. If you are interested, please join our next tutorial. Tutorials have been held at ISCA 2016, HPCA 2017, MICRO 2017 (upcoming).||
(OpenPiton project page)
Architectures for Novel Economic Models
|Contrary to the conventional wisdom that believes that IaaS cloud computing is simply a payment model which should be layered on top of existing multicore server chips, my work since joining Princeton has demonstrated that chip-level architecture and micro-architecture changes can yield substantial improvements in making markets more efficient (economic efficiency) thereby simultaneously boosting profits for cloud vendors and decreasing costs incurred by cloud customers. We have done this by designing the Sharing Architecture which debundles a core into individually rentable sub-components, published in [ASPLOS]. Together with Prof. Henry Hoffmann, we coupled the Sharing Architecture with a best-of-breed auto-tuner to make it easier for the cloud user to use a debundled architecture by only specifying high level goals, published at [ISCA]. We created Memory Inter-arrival Time Traffic Shaping (MITTS), a mechanism provision and price scarce off-chip bandwidth, published in [ISCA] and implemented in Piton. We extended MITTS into security mechanism to create Camouflage, published in [HPCA]. We have been active in investigating how cloud markets should change for the future. In particular, we have created the notion of an Availability Knob, published in [SoCC] as well as created incentives for self-capping, published in [SoCC].||
(slides from ASPLOS)
Data Center Commonality Exploiting Architectures
|Data centers have rampant unexploited commonality. Whether it be applications working on the same or similar data, or similar applications executing on different data, the consolidation of computation into a data center provides opportunities for exploiting commonality. My group created Execution Drafting to align program execution between two or more programs such that common dynamic instruction sequences can flow down a multithreaded processor pipeline interleaved (one instruction of the first program followed by one instruction of the second program). By careful hardware thread scheduling (interleaving of programs), the programs can “draft” down a multithreaded pipeline and save significant energy, published at [MICRO] and implemented in Piton. Building on Execution Drafting’s success, we created the Manycore ORiented Compressed Cache (MORC) which utilizes a log-based cache organization to compress cache lines together that have been filled into the cache close in time. By looking for commonality between cores, compressing tag information, and compressing large logs of cache data, MORC is able to best previous cache compression schemes for computational throughput, a key metric in the data center, published at [MICRO].||
(Execution Drafting paper)
Breaking Down Data Center Boundaries: Cores, Chips, Servers, Racks
|My group has been performing a ground-up rethink of microprocessor design to optimize a manycore microprocessor such that it can take advantage of being co-located with tens-of-thousands of the exact same chip. This requires breaking down unnecessary boundaries that exist in current data center computing. Breaking down boundaries between chips, we have explored how best to extend shared memory and I/O across the data center. Our work on Coherence Domain Restriction (CDR) investigates how to side-step the scalability challenge of creating large-scale cache coherence. Instead of having to support global cache coherence across a future 1000+ core chip or across the data center, Coherence Domain Restriction enables the creation of flexible cache coherence domains which are coherent within a domain, but not between domains. CDR has been published in [MICRO] and implemented in Piton.||
(Coherence Domain Restriction paper)
|Together with Prof. Barry Rand, we have been studying how computer architecture changes when it is applied to organic thin film transistors. This work is focused on how to create biodegradable microprocessors and has the potential to remove the environmental impact of e-waste including the environmental impact concerns of short life-span electronics such as cell phones. In this work, we fabricated countless OTFTs in our labs at Princeton, built models of the transistors, built and optimized a complete standard cell library, and completed an architectural design space exploration by synthesizing different processor architectures to our OTFT cell library. This is the first work to investigate optimizing computer architectures for organic semiconductors, to be published at [MICRO].||
(organic transistor paper)
At Princeton, I have had the pleasure of teaching and mentoring an elite set of students. In addition, I have had the opportunity to extend my teaching to the world through running the world's first Massively Open Online Course (MOOC) in Computer Architecture through the Coursera/Princeton pilot program. Beyond teaching, I am a freshman advisor in Princeton's School of Engineering and Applied Sciences.
Princeton ELE 585 (Old 580A/575) – Parallel Computation
|Parallel Computation teaches graduate students the skills needed to be successful in conducting research in parallel programming and parallel computer architecture. This course covers different parallel programming languages and parallel programming models as well as discusses in depth different parallel computer architectures and design trade-offs while creating parallel computing machines. In addition to the educational content goals, this class also introduces students to critically reading and reviewing research papers and provides students an opportunity to write a research paper with new results which is critiqued by other students. I created the curriculum for this course at Princeton University.|
Princeton ELE/COS 475 – Computer Architecture
|ELE/COS 475 teaches Juniors, Seniors, and first-year graduate students how to build complex microprocessors. This class builds on a computer organization class (ELE 375) along with a digital logic class (ELE 206). Topics include the design of superscalar processors, out-of-order microprocessors, VLIW processors, Vector processors, multicore processors, memory coherence protocols, and on-chip networks. I revitalized this course at Princeton University.|
Princeton ELE 301 – Designing Real Systems
|Team-taught Princeton’s Junior-Level course on system design. I have redesigned my portion of this course to include a focus on digital systems where the students learn all of the levels of abstraction (software to hardware) involved with a modern day tablet or smartphone. This is taught with the help of newly designed labs where students write Android programs for a tablet and interface the tablet with a custom hardware device.|
Coursera/Princeton – Computer Architecture
|I adapted my ELE/COS 475 course to teach it online as part of Princeton University’s experiment in Massively Open Online Courses (MOOC). Over 200,000 students have enrolled in this course and it is the first full Computer Architecture class on the Internet.|
Massachusetts Institute of Technology, Cambridge, MA
Ph.D. in Electrical Engineering and Computer Science, February 2012
- Thesis: "dPool: A Distributed Data Structure for Factored Operating Systems"
Massachusetts Institute of Technology, Cambridge, MA
M.S. in Electrical Engineering and Computer Science, September 2002
University of Illinois at Urbana-Champaign, Urbana, IL
B.S. in Electrical Engineering, May 2000
- Minor: Computer Science
- National Science Foundation
- Defense Advanced Research Projects Agency
- Air Force Office of Scientific Research
- Intel (Equipment Donation)
- Xilinx (Equipment Donation)
- NVIDIA (Equipment Donation)
- Advanced Micro Devices (Equipment Donation)
- Altera (Equipment Donation)
Open Source Software/Hardware
I enjoy building open source designs in order to increase the impact of my work. My group has parameterized and released our Princeton Piton Processor as the OpenPiton project. In addition, we have open sourced our 4000+ core manycore simulator, the Princeton Manycore Executor (PriME).
- OpenPiton: OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton is a highly configurable version of the 25-core Princeton Piton Processor. http://www.openpiton.org
- PriME: The Princeton Manycore Executor is a distributed, parallel, manycore simulator capable of simulating 4000+ core configurations using hundreds of host cores across a cluster. http://primesim.princeton.edu
A passion of mine is climbing mountains. I enjoy hiking, skiing, rock climbing, camping, and most other outdoor endeavors. I am slowly learning how to be a competent mountaineer. Following are some selected adventures.
Catskill 3500 Footers
|In July 2010, I climbed the Grand Teton. We took the Owen-Spalding route with the Wittich Crack (5.6) variation. My climbing partner for this trip was Patrick Lam. On our first summit attempt day, a thunderstorm rolled in while we were on the upper mountain. We successfully retreated, but others we not as lucky. We successfully summited two days later.||
|In the first week of March 2010, myself and four other intrepid souls decided we should climb Mt. Katahdin in Maine. Unfortunately, we decided to do this during the end of a Nor'easter (the winter version of a tropical storm). The approach to Mt. Katahdin in the winter is very long and required us to cross country ski in while pulling sleds (polks). Due to the fresh 30 inches of snow on top of the 100 inches of pre-existing snow, the avalanche conditions were quite high, so we ended up sticking to ridge routes which was not our original plans. We successfully summited Hamlin and Baxter Peaks (Baxter is the high-point) of the Katahdin Massif. We also enjoyed some downhill skiing, ice climbing, and backcountry winter camping.||
|In July 2009 Aaron Yahr, Patrick Lam, and I successfully summited Mt. Rainier via the Emmons Glacier route. This was the second attempt for me after an unsuccessful summit attempt due to weather in the summer of 2008. Mt. Rainier is a great mountaineering adventure that I highly recommend. It is heavily glaciated and the Emmons Glacier route requires some good route finding skills in order to not get lost. A lesson we learned from this trip include not trying to climb too high too fast. We tried to summit the day after we arrived at high-camp. This was a bad idea as we were not acclimatized. We turned back to high-camp and summited the next day.||
New Hampshire 4000 Footers
|A common goal for avid hikers in New England is to hike all forty-eight mountains in New Hampshire which are taller than 4000 ft. above sea level. The AMC even has a club devoted to this. I completed hiking all of the 4000 footers in July 2007. I think such clubs or games is a good idea because it encourages avid hikers to hike mountains besides the "beautiful" ones and evens out trail wear. I am now working on hiking all of the NH 4000 footers in the winter, a much harder goal.||