Computational Social Science (Fall 2014)

Computational Social Science:
Social Research in the Digital Age

Sociology 596

Princeton University

Fall 2014

Tuesday 2pm-5pm (second half of semester)

190 Wallace Hall

Instructor: Matthew Salganik

Overview

In the last decade we have witnessed the birth and rapid growth of Wikipedia, Google, Facebook, iPhones, Wi-Fi, YouTube, Twitter, and numerous other marvels of the digital age. In addition to changing the way we live, these tools---and the technological revolution they are a part of---have fundamentally changed the way that we can learn about the social world. We can now collect data about human behavior on a scale never before possible and with tremendous granularity and precision. The ability to collect and process "big data" enables researchers to address core questions in the social sciences in new ways and opens up new areas of inquiry.

This course on computational social science will emphasize social science rather than computation. We will focus on how traditional concepts of research design in the social sciences can inform our understanding of new data sources, and how these new data sources might require us to update our thinking on research design.

Prerequisets

There are no official prerequisites for the course, and students from all departments are welcome. Undergraduates interested in taking the course should contact the instructor for permission.

Course structure

Each three hour class will be a combination of lecture, discussion, and in-class activities. In order to participate fully in class, you must do the readings. There will be no exam.

Grading

Your grade will be based on the following components:

Response papers	75%	Each student will write a short response paper (2-3 pages) every week, except the first week. Students should view the papers as a chance to play with the ideas in the readings: look for contradictions, establish connections to your own research, develop empirical tests, etc. The response papers should not be simple summaries of the readings. Your response paper should be double-spaced with 1.5 inch margins on all sides (in order to leave room for my feedback), and it should be emailed to Judie Miller by Monday at midnight on the day preceding the class.
Class participation	25%	I intend to make this class an active experience for students. When we have discuss and in-class activities you should participate.

Auditors

If you are interested in auditing the course, you should contact the instruction for permission. All auditors will be expected to complete all of the reading and assignments.

Open access

The prohibitive cost of academic journals means that many of the readings for this course are not available to everyone. I have marked these closed access articles with a . Fortunately, some of the more recent scholarship in this area is freely available to everyone in the world. I have marked these open access article with a . It is my hope that eventually I will be able to construct this syllabus using exclusively open access scholarship. In the meantime, if you do not have access to a university library, copies of many of the closed access articles can be found through Google Scholar.

Introduction and Ethics (November 4, 2014)

In this first class we will cover a broad overview of computational social science, focusing on both strengths and weaknesses. The promise of social research in the digital age, however, also comes with a dark side. Our increased ability to collect, store, and analyze data increases our chances of inadvertently putting our research participants at risk. The procedures used to protect research subjects have evolved over many years, but the capabilities of Internet-era researchers are changing very quickly and new norms have not yet been established. There are no response papers due this week.

Lazer, D. et al. 2009. Computational social science. Science.
Anderson, C. 2008. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired.
Einav, L. and Levin, J. 2014. The Data Revolution and Economic Analysis. in Innovation Policy and the Economy edited by Josh Lerner and Scott Stern.
King, G. 2011. Ensuring the Data-Rich Future of the Social Sciences. Science.

Wikipedia page on The Belmont Report.
The Belmont Report.

Kramer, A. et al. 2014. Experimental evidence of massive-scale emotional contagion through social networks. PNAS.
Adam Kramer's explanation.
PNAS editorial expression of concern
Watts, D. 2014. Stop complaining about the Facebook study. It's a golden age for research. Guardian.
Crawford, K. 2014. The Test We Can—and Should—Run on Facebook. The Atlantic.
Rosen, J. 2014. Facebook’s controversial study is business as usual for tech companies but corrosive for universities. Washington Post.

Ohm. 2010. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. UCLA Law Review. Only read pages 1701-1731

Rosenbloom. 2007. On Facebook, Scholars Link Up With Data. New York Times.
Parry. 2011. Harvard Researchers Accused of Breaching Students' Privacy. Chronicle of Higher Education.
Zimmer. 2010. "But the data is already public": on the ethics of research in Facebook. Ethics and Information Technology.

Optional additional reading

Watts. 2007. A twenty-first century science. Nature.
Madrigal. 2012. The Philosopher Whose Fingerprints Are All Over the FTC's New Approach to Privacy. The Atlandic
boyd and Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society.
A collection of other writing about the Facebook emotional contagion experiment compiled by James Grimmelmann.
Meyer et al. 2014. Misjudgements will drive social trials underground Nature.

Clickstreams and digital traces (November 11, 2014)

Human behavior in the digital age often leaves behind traces, and these traces are being aggregated on a scale that is difficult to comprehend. In this meeting we will discuss the strengths and weaknesses of using these traces for social research.

Helft. 2008. Google Uses Searches to Track Flu's Spread. New York Times.
Ginsberg et al. 2008. Detecting influenza epidemics using search engine query data. Nature.
Goel et al. 2010. Predicting consumer behavior with Web search. PNAS.
Lazer et al. 2014. The Parable of Google Flu: Traps in Big Data Analysis. Science
Google Correlate: The Comic Book.
Spend 2 minutes playing with Google Trends
Spend 2 minutes playing with Google Correlate

Kossinets and Watts. 2006. Empirical Analysis of an Evolving Social Network. Science.

Two blog posts: natural experiments, discovered not created and natural experiments created by online and offline processes.
Einav et al. 2011. Taking advantage of the vast amount of data generated on the internet Vox.
Mas and Moretti. 2009. Peers at Work. American Economic Review.

Optional additional reading

Ugander et al. 2012. Structural diversity in social contagion. PNAS.
Golder and Macy. 2014. Digital Footprints: Opportunities and Challenges for Online Social Research. Annual Review of Sociology.
Onnela et al. 2008. Structure and tie strengths in mobile communications networks. PNAS.
Preis et al. 2013. Quantifying the Digital Traces of Hurricane Sandy on Flickr. Scientific Reports.
Polgreen et al. 2008. Using Internet Searches for Influenza Surveillance. Clinical Infectious Disease.
Cook et al. 2011. Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic. PLoS ONE.
Hickmann et al. 2014. Forecasting the 2013-2014 Influenza Season using Wikipedia arXiv.
Stephens-Davidowitz. 2014. The cost of racial animus on a black candidate: Evidence using Google search data. Journal of Public Economics.
Golder and Macy. 2011. Diurnal and Seasonal Mood Vary with Work, Sleep and Daylength Across Diverse Cultures. Science .

Experiments (November 18, 2014)

The web offers numerous advantages over the traditional laboratory for the conduct of social science experiments. First, the web allows researchers to conduct experiments on a completely different scale; lab experiments are limited to hundreds of participants, but web-based experiments involving tens of thousands of participants have already been conducted and larger experiments are becoming increasingly practical. The web also allows researchers access to a much broader pool of participants and allows researchers to study decision making in a more natural environment. But, conducting experiments on the web also includes some drawbacks including unknown participant pools and limited control over participants. In this class we will discuss four types of web-based experiments: overlayed experiments on existing sites, experiments embedded in existing websites, experiments using micro-payment platforms (e.g. Amazon's Mechanical Turk), and group experiments. The strengths and weaknesses of the various approaches will be compared.

van de Rijt et al. 2014. Field Experiments of Success-Breeds-Success Dynamics. PNAS .
Blog post about The visible hand: Race and online market outcomes.
Doleac and Stein The Visible Hand: Race and Online Market Outcomes. The Economic Journal. [replication data and code]

Kramer et al. 2014. Experimental evidence of massive-scale emotional contagion through social networks. PNAS.
Baskshy et al. 2014. Social Influence in Social Advertising: Evidence from Field Experiments EC .

Berinsky et al. 2012. Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk. Political Analysis. [replication data and code]
Goldstein et al. 2014. The Economic and Cognitive Costs of Annoying Display Advertisements Journal of Marketing Research (in press).

Hedstrom. 2006. Experimental macro sociology: Predicting the next best seller. Science.
Salganik et al. 2006. Experimental study of inequality and unpredictability in an artificial cultural market. Science [replication data]
van der Leij. 2011. Experimenting with Buddies. Science.
Centola. 2011. An Experimental Study of Homophily in the Adoption of Health Behavior. Science.

Optional additional reading

Kohavi et al. 2012. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. KDD.
Bakshy et al. 2012. Social Influence in Social Advertising: Evidence from Field Experiments. EC.
Bakshy 2014. Big experiments: Big data’s friend for making decisions. Facebook Data Science Blog.
Bakshy et al. 2014. Designing and Deploying Online Field Experiments. WWW.
Taylor et al. 2013. Selection Eects in Online Sharing: Consequences for Peer Adoption. EC.
Weinberg et al 2014. Comparing Data Characteristics and Results of an Online Factorial Survey between a Population-Based and a Crowdsource-Recruited Sample. Sociological Science.
Suri and Watts. 2011. Cooperation and Contagion in Web-Based, Networked Public Goods Experiments. PLOS One.
Horton et al. 2011. The online laboratory: conducting experiments in a real labor market. Experimental Economics.
Mason and Suri. 2012. A Guide to Behavioral Experiments on Mechanical Turk. Behavior Research Methods.

Mobile phones and wearable sensors (November 25, 2014)

There are approximately four billion mobile phones in the world. While these devices are often thought of as "phones," the newest wave of "smart phones" that are increasingly dominant in developed countries are actually sophisticated mobile computers that offer amazing opportunities for researchers. In this class we will discuss the two main forms of research using mobile phones and wearable sensors: research that uses data collected from individual devices and research that uses aggregate data collected by mobile phone companies. Within the category of research that users individual devices, we will distinguish between research that uses phones and research that uses custom-build devices. We will also distinguish between active and possive data collection.

Miller. 2012. The Smartphone Psychology Manifesto. Perspectives on Psychological Science.
Kaplan and Stone. 2012. Bringing the Laboratory and Clinic to the Community: Mobile Technologies for Health Promotion and Disease Prevention. Annual Review of Psychology.

Gething and Tatem. 2011. Can Mobile Phone Data Improve Emergency Response to Natural Disasters? PLoS Medicine.
Bengtsson et al. 2011. Improved Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake Geospatial Study in Haiti. PLoS Medicine.
Ebola and big data: On hold. The Economist .
Onnela, et al. 2008. Structure and tie strengths in mobile communications networks. PNAS.

A paper from Naomi Sugie's dissertation (TBA)

Salathe et al. 2010. A high-resolution human contact network for infectious disease transmission. PNAS. [replication data]

Optional additional reading

Bagrow et al. 2011. Collective Response of Human Populations to Large-Scale Emergencies. PLoS ONE.
Palmer et al. 2013. New Approaches to Human Mobility: Using Mobile Phones for Demographic Research. Demography.
Jensen 2007. The Digital Provide: Information (Technology), Market Performance, and Welfare in the South Indian Fisheries Sector. Quarterly Journal of Economics.
Deville et al. 2014. Dynamic population mapping using mobile phone data. PNAS.
Raento et al. 2009. Smartphones: An Emerging Tool for Social Scientists. Sociological Methods and Research.
Wesolowski et al. 2012. Quantifying the impact of human mobility on malaria. Science.
Wesolowski et al. 2013. The impact of biases in mobile phone ownership on estimates of human mobility. Journal of the Royal Society Interface.
Wesolowski et al. 2014. Quantifying travel behavior for infectious disease research: a comparison of data from surveys and mobile phones. Scientific Reports.
Lajous et al. 2010. Mobile Messaging as Surveillance Tool during Pandemic (H1N1) 2009, Mexico. Emerging Infectious Diseases.
Wesolowski et al. 2014. Commentary: Containing the Ebola Outbreak – the Potential and Challenge of Mobile Network Data PLOS Current Outbreaks.
Ebola and big data: Call for help. The Economist.
Blumenstock. 2014. Calling for Better Measurement: Estimating an Individual’s Wealth and Well-Being from Mobile Phone Transaction Records. KDD.

Text as data (December 2, 2014)

The quantitative study of collections of documents—often called "content analysis"—has a long history in the social sciences. Recently, however, two technological changes have fundamentally expanded content analysis. First, rather than having to manually input text stored on paper, researchers now have access to huge, electronic corpora of texts ranging from hundreds of thousands of political speeches to millions of books to billions of tweets. Second, researchers now have increased computational power enabling them to analyze all of this text in new ways. This class will show how these two changes offer social scientists and computational scientists new opportunities to learn from text.

Grimmer and Stewart. 2012. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis.

Blei 2012. Probabilistic Topic Models. Communications of the ACM.
DiMaggio et al. 2013. Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics.
Roberts et al. 2014. Structural topic models for open-ended survey responses. American Journal of Political Science. [replication data and code]

Hopkins and King. 2010. A Method of Automated Nonparametric Content Analysis for Social Science. American Journal of Political Science. [replication data and code]
Spirling. 2011. US Treaty-making with American Indians: Institutional Change and Relative Power, 1784–1911. American Journal of Political Science. [replication data and code]

Optional additional reading

Mohr and Bogdanov. 2013. Topic models: What they are and why they matter. Poetics.
Jamal et al. 2014. Anti-Americanism and Anti-Interventionism in Arabic Twitter Discourse. Working paper.
Roberts et al. 2014. Navigating the Local Modes of Big Data: The Case of Topic Models. In Data Analytics in Social Science, Government, and Industry.
Lucas et al. 2014. Computer assisted text analysis for comparative politics. Political Analysis.
King et al. 2014. Computer-Assisted Keyword and Document Set Discovery from Unstructured Text. Working paper.
Bail. 2012. The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse about Islam since the September 11th Attacks. American Sociological Review.
Quinn et al. 2010. How to Analyze Political Attention with Minimal Assumptions and Costs. American Journal of Political Science.

Crowdsourcing, citizen science, and conclusions (December 9, 2014)

Anyone who has used Wikipedia understands the power of large-scale social collaboration. In the first half of this class we will try to figure out how we can we harness this collective power for other intellectual challenges? In the second half of the class, we will wrap up the class with a special activity.

Draft book chapter (coming soon)

Markoff. 2010. In a Video Game, Tackling the Complexities of Protein Folding. New York Times.
Cooper et al. 2010. Predicting protein structures with a multiplayer game. Nature.

Tuite et al. 2011. PhotoCity: Training Experts at Large-scale Image Acquisition Through a Competetive Game. CHI.

von Ahn, et al. 2008. reCAPTCHA: Human-based character recognition via web security measures. Science.

Optional additional reading

Khatiba et al. 2011. Algorithm discovery by protein folding game players. PNAS.

This course syllabus is licensed under a Creative Commons Attribution 3.0 Unported License.

Computational Social Science: Social Research in the Digital Age

Sociology 596

Princeton University

Fall 2014

Tuesday 2pm-5pm (second half of semester)

190 Wallace Hall

Instructor: Matthew Salganik

Overview

Prerequisets

Course structure

Grading

Auditors

Open access

Introduction and Ethics (November 4, 2014)

Optional additional reading

Clickstreams and digital traces (November 11, 2014)

Optional additional reading

Experiments (November 18, 2014)

Optional additional reading

Mobile phones and wearable sensors (November 25, 2014)

Optional additional reading

Text as data (December 2, 2014)

Optional additional reading

Crowdsourcing, citizen science, and conclusions (December 9, 2014)

Optional additional reading

Computational Social Science:
Social Research in the Digital Age