In this project, you will work with a partner to reproduce and extend a published research article. The primary goal of this project is to provide the raw materials for a publishable paper, which will be developed and written in the empirical seminar next year. The secondary goals of this project are to 1) give you practice learning new statistical material and computational techniques; 2) give you practice using good software engineering techniques; 3) give you practice doing collaborative research, something you will do throughout your career; 4) show you, in a way that no homework assignment can, the complexity and joy of doing real data analysis. This project not be without pain -- Nightmare after nightmare: students trying to replicate work -- but it will be worth it.
Here are the six steps to the project:
You should pick a paper in an area that is interesting to you. Some people recommend that the paper be published recently and in a top journal in the field. Also, the paper should use methods related to those that we are going to learn in the course.
Since this is the first time I’ve tried this project, I'm going to require that the paper you select has data already available. I don't want you to spend your time emailing with authors. That is not the kind of learning experience I’m hoping to create with this project. Also, I strongly recommend that you pick a paper where the code is already available. To be clear, you should not use this code. But, having this code available means that you should be able to ultimately reproduce the results in the published paper.
If you are looking for papers to replicate here are some strategies that you could use:
When picking a paper it may be hard for you to tell if the methods will be far beyond what will be covered in class. When in doubt, post a question to Piazza.
Before getting too far into your project you must get your plan approved. Your project proposal must be written in a structured format and must include each of the following elements:
If you do not provide us with all of those elements, it will be impossible for us to give you good advice.
You should be able to re-create every table and graph in the paper. At this point, however, you should not spend to much time on layout. So, if hte figure has a lengend on top and your legend is on the side that is fine, as long as the results clearly match.
Begin your replication by creating a document with images of each table and each figure. Next, add the parts of the text where the authors describe how the results were generated. Finally, create code that reproduces the results. This document will be highly structured. For example,
There does not need to be any theory or literature review in this document. It is just about reproducing the results.
Someone in the class will clone your repo and reproduce your reproduction. Also, they will send you some helpful feedback.
You should somehow extend or improve the paper you are reproducing. Here are some examples of how you could do that:
Again, someone in the class will clone your repo and reproduce your work. Also, they will send you some helpful feedback.
Twice in the course you will reproduce a peer's work and provide feedback. Your feedback should be constructive and specific. Here are examples for the kinds of things could be included in the feedback:
At the end of the semester you will present your results to the class. The goal of your presentation is to teach the class about your replication project and explain what you have learned.
Each group will do a 5 minute presentation; here’s a rough outline (but you can tailor it to your specific project):
You are welcome to use slides.
You are welcome to use code written by other people, including snippets of code you find online and code written by people who are helping you. However, if you are using code from someone else that must be aknowledged. You would not use someone else's sentences in a paper without attribution, and you should not use their code in your code without attribution either. Sometimes there may be a gray area, and if you are in doubt you should attribute.
This project was inspired by similar projects other classes. I would like to thank Gary King, Brandon Stewart, Cristobal Young, Kosuke Imai, and Nicole Janz for their help in shaping this project.