Computational Social Science:
Social Research in the Digital Age

Sociology 596
Princeton University
Fall 2014

Tuesday 2pm-5pm (second half of semester)
190 Wallace Hall
Instructor: Matthew Salganik


In the last decade we have witnessed the birth and rapid growth of Wikipedia, Google, Facebook, iPhones, Wi-Fi, YouTube, Twitter, and numerous other marvels of the digital age. In addition to changing the way we live, these tools---and the technological revolution they are a part of---have fundamentally changed the way that we can learn about the social world. We can now collect data about human behavior on a scale never before possible and with tremendous granularity and precision. The ability to collect and process "big data" enables researchers to address core questions in the social sciences in new ways and opens up new areas of inquiry.

This course on computational social science will emphasize social science rather than computation. We will focus on how traditional concepts of research design in the social sciences can inform our understanding of new data sources, and how these new data sources might require us to update our thinking on research design.


There are no official prerequisites for the course, and students from all departments are welcome. Undergraduates interested in taking the course should contact the instructor for permission.

Course structure

Each three hour class will be a combination of lecture, discussion, and in-class activities. In order to participate fully in class, you must do the readings. There will be no exam.


Your grade will be based on the following components:

Response papers 75% Each student will write a short response paper (2-3 pages) every week, except the first week. Students should view the papers as a chance to play with the ideas in the readings: look for contradictions, establish connections to your own research, develop empirical tests, etc. The response papers should not be simple summaries of the readings. Your response paper should be double-spaced with 1.5 inch margins on all sides (in order to leave room for my feedback), and it should be emailed to Judie Miller by Monday at midnight on the day preceding the class.
Class participation 25% I intend to make this class an active experience for students. When we have discuss and in-class activities you should participate.


If you are interested in auditing the course, you should contact the instruction for permission. All auditors will be expected to complete all of the reading and assignments.

Open access

The prohibitive cost of academic journals means that many of the readings for this course are not available to everyone. I have marked these closed access articles with a . Fortunately, some of the more recent scholarship in this area is freely available to everyone in the world. I have marked these open access article with a . It is my hope that eventually I will be able to construct this syllabus using exclusively open access scholarship. In the meantime, if you do not have access to a university library, copies of many of the closed access articles can be found through Google Scholar.

Introduction and Ethics (November 4, 2014)

In this first class we will cover a broad overview of computational social science, focusing on both strengths and weaknesses. The promise of social research in the digital age, however, also comes with a dark side. Our increased ability to collect, store, and analyze data increases our chances of inadvertently putting our research participants at risk. The procedures used to protect research subjects have evolved over many years, but the capabilities of Internet-era researchers are changing very quickly and new norms have not yet been established. There are no response papers due this week.

Optional additional reading

Clickstreams and digital traces (November 11, 2014)

Human behavior in the digital age often leaves behind traces, and these traces are being aggregated on a scale that is difficult to comprehend. In this meeting we will discuss the strengths and weaknesses of using these traces for social research.

Optional additional reading

Experiments (November 18, 2014)

The web offers numerous advantages over the traditional laboratory for the conduct of social science experiments. First, the web allows researchers to conduct experiments on a completely different scale; lab experiments are limited to hundreds of participants, but web-based experiments involving tens of thousands of participants have already been conducted and larger experiments are becoming increasingly practical. The web also allows researchers access to a much broader pool of participants and allows researchers to study decision making in a more natural environment. But, conducting experiments on the web also includes some drawbacks including unknown participant pools and limited control over participants. In this class we will discuss four types of web-based experiments: overlayed experiments on existing sites, experiments embedded in existing websites, experiments using micro-payment platforms (e.g. Amazon's Mechanical Turk), and group experiments. The strengths and weaknesses of the various approaches will be compared.

Optional additional reading

Mobile phones and wearable sensors (November 25, 2014)

There are approximately four billion mobile phones in the world. While these devices are often thought of as "phones," the newest wave of "smart phones" that are increasingly dominant in developed countries are actually sophisticated mobile computers that offer amazing opportunities for researchers. In this class we will discuss the two main forms of research using mobile phones and wearable sensors: research that uses data collected from individual devices and research that uses aggregate data collected by mobile phone companies. Within the category of research that users individual devices, we will distinguish between research that uses phones and research that uses custom-build devices. We will also distinguish between active and possive data collection.

Optional additional reading

Text as data (December 2, 2014)

The quantitative study of collections of documents—often called "content analysis"—has a long history in the social sciences. Recently, however, two technological changes have fundamentally expanded content analysis. First, rather than having to manually input text stored on paper, researchers now have access to huge, electronic corpora of texts ranging from hundreds of thousands of political speeches to millions of books to billions of tweets. Second, researchers now have increased computational power enabling them to analyze all of this text in new ways. This class will show how these two changes offer social scientists and computational scientists new opportunities to learn from text.

Optional additional reading

Crowdsourcing, citizen science, and conclusions (December 9, 2014)

Anyone who has used Wikipedia understands the power of large-scale social collaboration. In the first half of this class we will try to figure out how we can we harness this collective power for other intellectual challenges? In the second half of the class, we will wrap up the class with a special activity.

Optional additional reading

Creative Commons License
This course syllabus is licensed under a Creative Commons Attribution 3.0 Unported License.