What is it that I do at Princeton?
There are two versions of the answer to this question: take your pick.
NOTE: The latter version is targeted at people who have had an inkling of the unrest in the computing world, and are not afraid to swallow the red pill.
VERSION 1:
My research focuses on extracting parallelism from software at multiple granularities, and orchestrating the parallelism that is exploited at the different granularities in order to best match the target platform and the latency-throughput demands of the application.
Extraction of parallelism: Speculative parallelism appears to be the key to extract performance from multicore hardware for general-purpose applications. To speed up execution of software on multicore hardware, I have built a software system that efficiently supports various speculative parallelization schemes.
Orchestration of parallelism: Applications exhibit parallelism at multiple granularities. Systems such as the one mentioned in the previous paragraph are now able to extract the parallelism efficiently. The right mixture of parallelism that must be exploited depends entirely on the target hardware, and the application's latency-throughput demands. To orchestrate parallelism in some optimal fashion, I am building a runtime system, details of which will be posted at some point in the near future.
VERSION 2:
Gains in computing performance have played a pivotal role in raising our standard of living by enabling many of the technologies we take for granted. These gains have been made possible primarily by advances in semiconductor technology. An observation by Intel's co-founder, Gordon Moore, about the state of the universe vis-a-vis transistor-based devices such as microprocessors captures the essence of the innovation in this field.
As the graph above shows, the number of transistors that can be put "inexpensively" on an IC (processor chip) has doubled approximately every two years. Think about that for a second: The number of additional transistors that are going to be put on a chip in the next couple of years equals the number of transistors that have been inexpensively put on a chip since the beginning of time. By inexpensively, what is meant is that the manufacturing cost of the chip en masse is low enough for the chip manufacturers to be profit-making entities. More transistors make for more raw compute power. Semiconductor advances have also allowed the rate at which these transistors are clocked to be increased. And, until recently, processor manufacturers were racing each other on this highway by coming out with 1.6 GHz processors, 2.8 GHz processors, and so on. As end users, what these advances boiled down to was that one could UPGRADE one's computer hardware every couple years and have ALL THE SOFTWARE running on the computer perform better, be more responsive, and in general, do more things or new things in the same amount of time. TODO: Insert pre-2004 image here.
But...These many transistors running at GHz frequencies consume a lot of energy and produce an incredible amount of heat. Furthermore, using the extra transistors in the form of on-chip caches, etc. started yielding diminshing returns. This has caused processor manufacturers to take a step back and look for ways to get performance while being power-efficient. At this point, the term "performance" needs to be clearly defined. Consider a group of tasks to be performed. One option is to complete each individual task sooner (this is called improving the latency), while another option is to finish more tasks in an interval of time (this is called improving the throughput). These options are often in conflict with each other. Processors up to 2004 can be called latency processors: They focused on reducing the latency of completing a single task (such as a single run of a video encoder). Due to the reasons mentioned earlier, improving latency is no longer possible using pre-2004 techniques. This is roughly the time mainstream multicore processors were introduced. The additional transistors that Moore's Law scaling provides are converted into more processing cores on the same chip, running at a reasonable frequency such that the total power consumption and heating is kept within limits. TODO: Insert CMP image here.
How exactly does this improve matters?
TUNE IN A BIT LATER...EXCITEMENT GUARANTEED.