Please check out my research group's webpage.

Prior Research

( Curriculum Vitae )

Multicore and Cloud Operating Systems

Cloud computers and multicore processors are two emerging classes of computational hardware that have the potential to provide unprecedented compute capacity to the average user. In order for the user to effectively harness all of this computational power, operating systems (OSes) for these new hardware platforms are needed. Existing multicore operating systems do not scale to large numbers of cores, and do not support clouds. Consequently, current day cloud systems push much complexity onto the user, requiring the user to manage individual Virtual Machines (VMs) and deal with many system-level concerns. I am researching how to construct operating systems for future multicore processors and current day Infrastructure as a Service (IaaS) cloud systems. I founded the Factored Operating System (fos) project at MIT. fos tackles OS scalability challenges by factoring the OS into its component system services. Each system service is further factored into a collection of Internet-inspired servers which communicate via messaging. We call this set of system servers which collaborate to provide a single system service a fleet. Fleets can grow and shrink in response to demand. fos takes into account spatial placement of user and system services to best optimize performance. fos provides a single system image across multiple physical and virtual machines. I designed fos's microkernel, messaging API, remote memory access API, implemented our first inter-machine proxy server, deployed fos on Amazon's EC2, and am currently designing our distributed object model to facilitate the construction of operating system server fleets. I published our foundational ideas of fos in [ACM Operating System Review] and we elaborated on the cloud aspects of fos at [SoCC]. The fos team contains a very talented group of nine graduate students, postdocs, and undergraduates. fos Overview Image
(fos project page)

Tilera Multicore Processor

In December 2004 I co-founded Tilera Corporation. While at Tilera, I was Lead Architect and designed the architecture of the TILE64 and TILEPRO64 processor families. I also designed our instruction set architecture, exception architecture, and various other subsystems. I also had the opportunity to implement our main processor pipeline. Tilera is full of extremely talented engineers. While this work was done at a small company, much of the work we did was breaking new barriers as we pioneered new ideas as no one had built a multicore processor to the scale that we had. We explored ideas in designing scalable memory systems, implementing I/O hardware in software, and multicore virtualization. We announced the first TILE processor at [HotChips], I published our work in [IEEE Micro], and our chip implementation was presented at [ISSCC]. Packaged Tilera TILE64 Processor
(more Tilera photos)

Parallel and Parallelizing Dynamic Binary Translator

Using the Raw Microprocessor as a parallel fabric, I worked on creating a parallel dynamic binary translator. This parallel dynamic binary translator takes in x86 Linux binaries and executes them on the Raw microprocessor. The Raw Microprocessor has a MIPS-like ISA so cross architecture dynamic translation was needed. I used the x86 parser out of Valgrind as a basis for this translator. This dynamic translator executed the translation step in parallel and speculatively. By doing so, code which had not yet been executed could be pre-translated thereby saving the first time translation cost. Also, the translator was able to take advantage of the parallelism of the Raw Microprocessor. I also used the Raw Microprocessor spatially as a fabric to build different virtual microprocessor topologies. For example, the translator can dynamically trade off cores that are used as data cache for more translation resources. Finally, the translator provided virtual memory on an architecture without any virtual memory via sandboxing. I really enjoyed this project as I learned how to build a best-of-breed dynamic binary translator and learned about backend compiler optimizations which I needed to implement to enable fast execution. I published this in [CGO]. Block Diagram of Dynamic Binary Translator from x86 to Raw
(slides from CGO)

Master's Thesis - Comparison of Multicores, ASICs, and FPGAs

I have always been interested in why different computational fabrics are better or worse for particular applications. I have also wondered just how much better an ASIC is than an FPGA and how much better an FPGA is than a microprocessor for implementing a particular algorithm. I have heard many rules of thumb, but have never seen hard numbers. To that end, for my master's thesis I did a quantitative study of the difference in performance and area of implementing bit-level communication algorithms on a microprocessor, tiled multicore, FPGA, and ASIC. The results were a little bit surprising. I found that ASICs provided 2-3x absolute performance improvement over a FPGA, and FPGAs provided 2-3x absolute performance improvement over a microprocessor. ASICs and FPGAs really shined when it came to silicon area improvements. I found that ASIC designs utilized 5-6 base 10 orders of magnitude less area than software on a microprocessor and FPGAs used 2-3 orders of magnitude less area than software on a microprocessor. My [Master's Thesis] has more detailed results and this work appeared in a shortened form at [FCCM]. Mapping of 802.11a convolutional encoder onto Raw
(Master's Thesis)

Raw Multicore Microprocessor

The Raw Multicore Microprocessor is a 16 core homogeneous multicore processor designed at MIT. The Raw Microprocessor explored many ideas in how to design large scale multicore processors, different parallel programming paradigms, different parallel compilation techniques, and how best to have multiple cores communicate. I designed the Raw Microprocessors dynamic networks, many of the testing structures, and parts of the ALU including a novel population count unit. I contributed to chip verification. After the chip was fabricated, I contributed to chip bringup. I also designed much of the FPGA support logic around the chip including our FPGA synthesis methodology, design of our FPGA network interface logic, the initial test harness, and contributed to board design. I really enjoyed the Raw design experience as I learned what it takes to fabricate real chips in Academia. Also, I learned what it is like to work in a group with a large goal. We evaluated the Raw Microprocessor in [ISCA, IEEE Micro, ISSCC], presented Raw at [HotChips], and wrote many more publications. Die Micrograph of the Raw Multicore Microprocessor
(more Raw photos)

Other Projects

In addition to my primary research, I have had the opportunity to conduct research in my graduate coursework. Several of my graduate courses included semester long research projects:
  • MIT 6.836 - Kickbot: A Spherical Autonomous Robot [paper]
  • MIT 6.892 (6.899) - Keyword Search for Freenet [paper]
Kickbot Spherical Robot
(kickbot paper)