My research spans computational social science, social movements, and quantitative methods.
My dissertation identifies, describes and explains political protests in China using social media data. Datasets about protest events in authoritarian have been scant. In the first chapter of my dissertation, I use the state-of-art deep learning methods with both image and text data to identify 200,000 offline collective action events in China from 2011 - 2017 from 9.5 Million Weibo (Chinese Twitter) posts. With this unique dataset, the second chapter of my dissertation characterizes the prevalence, trends, geographical distribution, issues, tactics of the protests in China and their interaction with the police.
The third chapter of my dissertation then explains why some grievances in China turned out to be protests and others fail, by comparing matched cases of grievance, some of which became protests and others remained as grievances.
Data from Twitter have been employed in prior research to study the impacts of events. Conventionally, researchers use keyword-based samples of tweets to create a panel of Twitter users who mention event-related keywords during and after an event. However, the keyword-based sampling is limited in its objectivity dimension of data and information quality. First, the technique suﬀers from selection bias since users who discuss an event are already more likely to discuss event-related topics beforehand. Second, there are no viable control groups for comparison to a keyword-based sample of Twitter users. We propose an alternative sampling approach to construct panels of users deﬁned by their geolocation. Geolocated panels are exogenous to the keywords in users’ tweets, resulting in less selection bias than the keyword panel method. Geolocated panels allow us to follow within-person changes over time and enable the creation of comparison groups. We compare diﬀerent panels in two real-world settings: response to mass shootings and TV advertising. We ﬁrst show the strength of the selection biases of keyword-panels. Then, we empirically illustrate how geolocated panels reduce selection biases and allow meaningful comparison groups regarding the impact of the studied events. We are the ﬁrst to provide a clear, empirical example of how a better panel-selection design, based on an exogenous variable such as geography, both reduces selection bias compared to the current state of the art and increases the value of Twitter research for studying events. While we advocate for the use of geolocated panels, we also discuss its weaknesses and application scenario seriously. This paper also calls attention to the importance of selection bias in impacting the objectivity of social media data.
Collective action is one of the most effective forms of political participation available to the public. The study of collective action has benefited greatly from protest event analysis that draws on data from traditional media reporting. However, in some settings, such as authoritarian regimes, where measures of collective action would be especially valuable, the government suppresses traditional media coverage of collective action. In this paper, we create CASM (Collective Action from Social Media)---a machine-assisted system that uses social media data to identify collective action events occurring in the real world by using deep learning, image as data, and two-stage classification. We discuss the advantages and limitations of this new data source, including the effects of social media censorship. We consider how validation can help make computer science methods more usable for social science research.
We discuss the ethical implications of our system and the data, which we plan to make public. We apply our system to China, and demonstrate strong internal performance as well as external validity compared to existing newspaper-based and hand-curated event datasets. We identify 197,734 unique collective action events from 2010 to 2017, creating one of the largest datasets of collective action events in any authoritarian regime.
Mayer N. Zald Distinguished Contribution to Scholarship Student Paper Award from the Section on Collective Behavior and Social Movements of ASA.
How does political protests impact the political behaviors and attitudes of citizens of another regime?
Existing theories imply a positive effect, but conventional data collection methods are unable to answer this question because of their difficulty in identifying witnesses, constructing counterfactuals, and obtaining pre-protest political behaviors. In this paper, I estimate the causal effect of 8 protests in Hong Kong from 2012--14 on political discussions and attitudes toward democray among visitors from mainland China. I treat the 8 protests as natural experiments, which create exogenous variations between treatment groups who were at Hong Kong when protest occurred and control groups who left Hong Kong just before the protests. Using difference-in-differences estimators, I find that physical presence at the scene of the protests caused a dramatic increase on witnesses' frequency of political discussions and their support for democracy, but such increase is limited to those who already discussed politics frequently and showed support for democracy. On the other hand, just being present at Hong Kong when protest occurred leads to a decrease in the frequency of political discussions and support for attitudes for Chinese visitors.
The results are robust in a replication test based on protests in Taiwan and three placebo tests.
This paper has implications for the use of social media data to study the unintended consequences of social movements.
Manuscripts In Preperation
"The Landscape of Collective Action in China"
"What Distinguish Protests From Potential Protests?"