Sociology 596: Computational Social Science, Fall 2016

Computational Social Science:
Social Research in the Digital Age

Sociology 596

Princeton University

Fall 2016

Tuesday 2pm-5pm (first half of semester)

165 Wallace Hall

Course materials: class github repository

Instructor: Matthew Salganik

Overview

Changes in technology---specifically the transition from the analog age to the digital age---mean that we can now collect and analyze social data in new ways. This six week mini-class is about doing social research in these new ways. Unlike some other courses on computational social science, this course will emphasize "social science" and de-emphasize "computation." We will focus on how traditional concepts of research design in the social sciences can inform our understanding of new data sources, and how these new data sources might require us to update our thinking on research design. The course should be helpful for social scientists that want to do more data science and data scientists that want to do more social science.

Course goals and learning objectives

Students will describe the opportunities and challenges that the digital age creates for social research.
Students will evaluate modern social research from the perspectives of both social science and data science.
Students will create modern research proposals that blend ideas from social science and data science.
Students will practice the techniques needed to actually conduct their proposed research (optional).

Course activities

Collaborative annotation and feedback (x5): During this course we will read a draft of my manuscript Bit by Bit: Social Research in the Digital Age. Whenever you read a chapter from this manuscript, there will be two assignments. First, you will participate in a collaborative annotation of the document using hypothes.is. This process is described on the annotation page. Second, you should post to Piazza ideas about how each of the chapters should be improved. This feedback should be completed by Monday at midnight for material that we will discuss on Tuesday.
Shorter research proposal (x2): Between weeks 2 and 5, you will have to write 2 short research proposals. These proposals are described in more detail in the research proposal guidelines. These proposals can be done in teams.
Proposal review (x4): For weeks 2, 3, 4, and 5, you will write a review of the proposals of your peers. These proposal reviews are described in more detail in the research proposal review guidelines. The goal of your proposal review is to be as helpful as possible.
Extended research proposal (x1): Building on your first two proposals, you will write an extended research proposal, which could be an improved version of one your earlier research proposals. The biggest difference between the shorter proposals and the extended proposal is that in the extended proposal you should take one concrete step toward actually implementing the research. More details are designed in the extended proposal guidelines. After you pitch your proposal in class on Tuesday, October 25, a final version of the proposal will be due Friday, October 28 at 5pm.
Pitch (x1): During the class you will have a chance to pitch your shorter proposals, and you will be expected to do a longer, more formal pitch of your longer research proposal. You will have 5 minutes to pitch your proposal in person on our last day of class. Then, as a class, we will spend 10 minutes discussing the proposal and how it can be improved.
Overall manuscript feedback (x1): After reading the entire manuscript and completing the class, you should send me a two-page review of the entire book manuscript. This will be due Friday, October 28.
Lab (x5) [Optional]:For students that want to practice the techniques needed to discuss this research, we will have drop-in labs 5 times during the course of the semester. At the lab, students will work together on a single problem. This lab is optional, and is described in more detail on the labs page.

Meeting structure

Each class meeting will be split into four main parts:

Overview: I’ll begin each class describing the main themes of the week. This should be shorter than is typical in a graduate seminar because I’ve already described these themes in my manuscript.
Case studies: We will discuss a few case studies in detail that serve to illustrate some important themes of the week.
Discussion of proposals: Students will take turns presenting their proposals, and there will be a student-led discussion of the proposals.
Preview: At the end of class, I’ll preview some themes that you’ll encounter in the reading for next class.

In general, the class will be a mix of professor-led discussion and student-led discussion. As the semester progresses, I will expect the students to take an increasingly active role in the course.

Logistics

See the logistics page for more information about time and location, prerequisites, collaboration policy, Piazza, grading, and open access.

Introduction and Ethics (September 20, 2016)

In this first class we will cover a broad overview of computational social science, focusing on blending ideas from social science and data science. A theme that runs throughout the course is ethics so we will cover it in the first week.

Anderson. 2008. The end of theory: The data deluge makes the scientific method obsolete. Wired.
Donoho. 2015. 50 years of data science. Working paper.
Provost and Fawcett. 2013. Data Science and its relationship to big data and data-driven decision making. Big Data.

Slides

Big data (September 27, 2016)

Human behavior in the digital age often leaves behind traces, and these traces are being aggregated by companies and governments on a massive scale. This week we will discuss the strengths and weaknesses of using these big data sources for social research. Then, I'll describe three approaches that can help you learn from these big data sources: counting things, forecasting, and approximating experiments.

Bit by Bit: Social Research in the Digital Age, Chapter 2: Observing behavior.

Farber. 2015. Why you Can't Find a Taxi in the Rain and Other Labor Supply Lessons from Cab Drivers. Quarterly Journal of Economics.

open version

Helft. 2008. Google Uses Searches to Track Flu's Spread. New York Times.
Ginsberg et al. 2008. Detecting influenza epidemics using search engine query data. Nature.
Goel et al. 2010. Predicting consumer behavior with Web search. PNAS.
Lazer et al. 2014. The Parable of Google Flu: Traps in Big Data Analysis. Science.
Yang et al. 2015. Accurate estimation of influenza epidemics using Google search data via ARGO. PNAS. [open version]
Google Correlate: The Comic Book.
Spend 2 minutes playing with Google Trends.
Spend 2 minutes playing with Google Correlate.

Mas and Moretti. 2009. Peers at Work. American Economic Review.

Slides

Surveys (October 4, 2016)

This week I'll begin by explaining that big data sources will not replace surveys. In fact, the abundance of big data sources increases---not decreases---the value of surveys. Given that motivation, I’ll summarize the total survey error framework that was developed during the first two eras of survey research. This framework enables us to understand new approaches to representation (e.g., non-probability samples) and new approaches to measurement (e.g., new ways of asking questions to respondents). Finally, I’ll describe two research templates for linking survey data to big data sources.

Bit by Bit: Social Research in the Digital Age, Chapter 3: Asking questions.

Wang et al. 2015. Forecasting elections with non-representative polls. International Journal of Forecasting.

Salganik and Levy 2015. Wiki Survey: Open and Quantifiable Social Data Collection. PLoS ONE. [replication data and code]

Blumenstock. 2014. Calling for Better Measurement: Estimating an Individuals Wealth and Well-Being from Mobile Phone Transaction Records. ACM Conference on Knowledge Discovery and Mining (KDD), Workshop on Data Science for Social Good.
Blumenstock et al. 2015. Predicting Poverty and Wealth from Mobile Phone Metadata. Science.
Kahn et al. 2015. Behavioral Modeling for Churn Prediction: Early Indicators and Accurate Predictors of Custom Defection and Loyalty. 2015 IEEE International Congress on Big Data. (SKIM)
Blumenstock 2016. Fighting poverty with data. Science.
LeCun et al. 2015. Deep learning. Nature. (SKIM)
Jean et al. 2016. Combining satellite imagery and machine learning to predict poverty. Science.

Slides

Running experiments (October 11, 2016)

Randomized controlled experiments have proven to be a powerful way to learn about the social world, and this week we will see how you can use them in your research. We will describe the difference between lab experiments and field experiments and the differences between analog experiments and digital experiments. Further, I’ll argue that digital field experiments can offer the best features of analog lab experiments (tight control) and analog field experiments (realism), all at a scale that was not possible previously. Next, I’ll describe three concepts---validity, heterogeneity of treatment effects, and mechanisms---that are critical for designing rich experiments. With that background, I’ll describe the trade-offs involved in the two main strategies for conducting digital experiments: doing it yourself or partnering with the powerful. Finally, I’ll conclude with some design advice about how you can take advantage of the real power of digital experiments and describe some of responsibility that comes with that power.

Bit by Bit: Social Research in the Digital Age, Chapter 4: Running experiments.

van de Rijt et al. 2014. Field Experiments of Success-Breeds-Success Dynamics. PNAS.
Doleac and Stein. 2013. The Visible Hand: Race and Online Market Outcomes. The Economic Journal. [replication data and code]

Salganik et al. 2006. Experimental study of inequality and unpredictability in an artificial cultural market. Science. [replication data]

Harper and Konstan. 2016. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS). [open version]

Kramer et al. 2014. Experimental evidence of massive-scale emotional contagion through social networks. PNAS.

Slides

Mass collaborations (October 18, 2016)

Wikipedia is amazing. A mass collaboration of volunteers created a fantastic encyclopedia that is available to everyone. The key to Wikipedia’s success was not new knowledge; rather, it was a new form of collaboration. The digital age, fortunately, enables many new forms of collaboration. Thus, we should now ask: what massive scientific problems---problems that we could not solve individually---can we now tackle together? Mass collaboration has a long, rich history in fields such as astronomy and ecology, but it is not yet common in social research. However, by describing successful projects from other fields and providing a few key organizing principles, I hope to convince you of two things. First, mass collaboration can be harnessed for social research. And, second, researchers who use mass collaboration will be able to solve problems that had previously seemed impossible. Although mass collaboration is often promoted as a way to save money, it is much more than that. As I will show, mass collaboration doesn’t just allow us to do research cheaper, it allows us to do research better.

Bit by Bit: Social Research in the Digital Age, Chapter 5: Mass collaboration.

Benoit et al. 2015. Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review.

Bell et al. 2010. All Together Now: A Perspective on the Netflix Prize. Chance.
Glaeser et al. 2016. Crowdsourcing City Government: Using Tournaments to Improve Inspection Accuracy. American Economic Review.

Tuite et al. 2011. PhotoCity: Training Experts at Large-scale Image Acquisition Through a Competitive Game. CHI.

Slides

Student-selected topic and pitch day (October 25, 2016)

~~For the final week of class, students will select the topic. I'll update the syllabus once the choice is complete. In this final class, the students will also pitch their final projects.~~

For the final week of class, students will present and discuss their final research proposal. Then, we will collectively generate a set ideas about possible next steps for continuing your training in computational social science.

Slides

Acknowledgements

This class was shaped by conversations with Brandon Stewart, especially his class on Text as Data from Spring 2016.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Computational Social Science: Social Research in the Digital Age

Sociology 596

Princeton University

Fall 2016

Tuesday 2pm-5pm (first half of semester)

165 Wallace Hall

Course materials: class github repository

Instructor: Matthew Salganik

Overview

Course goals and learning objectives

Course activities

Meeting structure

Logistics

Introduction and Ethics (September 20, 2016)

Slides

Big data (September 27, 2016)

Slides

Surveys (October 4, 2016)

Slides

Running experiments (October 11, 2016)

Slides

Mass collaborations (October 18, 2016)

Slides

Student-selected topic and pitch day (October 25, 2016)

Slides

Acknowledgements

Computational Social Science:
Social Research in the Digital Age