Replication and extension project

In this project, you will work with a partner to reproduce and extend a published research article. The primary goal of this project is to provide the raw materials for a publishable paper, which will be developed and written in the empirical seminar next year. The secondary goals of this project are to 1) give you practice learning new statistical material and computational techniques; 2) give you practice using good software engineering techniques; 3) give you practice doing collaborative research, something you will do throughout your career; 4) show you, in a way that no homework assignment can, the complexity and joy of doing real data analysis. This project not be without pain -- Nightmare after nightmare: students trying to replicate work -- but it will be worth it.

For the project, you must follow the schedule and meet certian technical requirements.

Here are the six steps to the project:

  1. Pick a paper
  2. Get your plan approved
  3. Reproduce the results exactly
  4. Have your work reproduced by a peer and get feedback
  5. Do something new
  6. Have your work reproduced by a peer and get feedback

1) Pick a paper

You should pick a paper in an area that is interesting to you. Some people recommend that the paper be published recently and in a top journal in the field. Also, the paper should use methods related to those that we are going to learn in the course.

Since this is the first time I’ve tried this project, I'm going to require that the paper you select has data already available. I don't want you to spend your time emailing with authors. That is not the kind of learning experience I’m hoping to create with this project. Also, I strongly recommend that you pick a paper where the code is already available. To be clear, you should not use this code. But, having this code available means that you should be able to ultimately reproduce the results in the published paper.

If you are looking for papers to replicate here are some strategies that you could use:

When picking a paper it may be hard for you to tell if the methods will be far beyond what will be covered in class. When in doubt, post a question to Piazza.

2) Get your plan approved

Before getting too far into your project you must get your plan approved. Your project proposal must be written in a structured format and must include each of the following elements:

  1. Citation information for the paper you want to reproduce
  2. Summary of main results of the paper
  3. List of main statistical methods used
  4. List of datasets used
  5. Were these datasets collected via complex sample designs? This will influence the difficulty of the replication.
  6. Summary of data availability and data access plan
  7. Summary of code availability (code availability is not required)
  8. List of ideas for how the paper could be improved or extended (see here for ideas about what you can do new)
  9. Short explanation for why you and your partner picked this paper
  10. pdf copy of the paper

If you do not provide us with all of those elements, it will be impossible for us to give you good advice.

3) Reproduce the results exactly

You should be able to re-create every table and graph in the paper. At this point, however, you should not spend to much time on layout. So, if hte figure has a lengend on top and your legend is on the side that is fine, as long as the results clearly match.

Begin your replication by creating a document with images of each table and each figure. Next, add the parts of the text where the authors describe how the results were generated. Finally, create code that reproduces the results. This document will be highly structured. For example,

There does not need to be any theory or literature review in this document. It is just about reproducing the results.

4) Have your duplication reproduced by a peer and receive feedback

Someone in the class will clone your repo and reproduce your reproduction. Also, they will send you some helpful feedback.

5) Do something new

You should somehow extend or improve the paper you are reproducing. Here are some examples of how you could do that:

6) Have your work reproduced by a peer and receive feedback

Again, someone in the class will clone your repo and reproduce your work. Also, they will send you some helpful feedback.



Additional information

Schedule

Tips for providing feedback

Twice in the course you will reproduce a peer's work and provide feedback. Your feedback should be constructive and specific. Here are examples for the kinds of things could be included in the feedback:

Technical requirements

Presentation

At the end of the semester you will present your results to the class. The goal of your presentation is to teach the class about your replication project and explain what you have learned.

Each group will do a 5 minute presentation; here’s a rough outline (but you can tailor it to your specific project):

You are welcome to use slides.

Using code from others

You are welcome to use code written by other people, including snippets of code you find online and code written by people who are helping you. However, if you are using code from someone else that must be aknowledged. You would not use someone else's sentences in a paper without attribution, and you should not use their code in your code without attribution either. Sometimes there may be a gray area, and if you are in doubt you should attribute.

Inspirations

This project was inspired by similar projects other classes. I would like to thank Gary King, Brandon Stewart, Cristobal Young, Kosuke Imai, and Nicole Janz for their help in shaping this project.

Suggested further reading




Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.