My research spans computational social science, social movements, and quantitative methods.
My dissertation identifies, describes and explains political protests in China using social media data. Datasets about protest events in authoritarian have been scant. In the first chapter of my dissertation, I use the state-of-art deep learning methods with both image and text data to identify 200,000 offline collective action events in China from 2011 - 2017 from 9.5 Million Weibo (Chinese Twitter) posts. With this unique dataset, the second chapter of my dissertation characterizes the prevalence, trends, geographical distribution, issues, tactics of the protests in China and their interaction with the police.
The third chapter of my dissertation then explains why some grievances in China turned out to be protests and others fail, by comparing matched cases of grievance, some of which became protests and others remained as grievances.
Data from Twitter have been employed in prior research to study the impacts of events. Conventionally, researchers use keyword-based samples of tweets to create a panel of Twitter users who mention event-related keywords during and after an event. However, the keyword-based sampling is limited in its objectivity dimension of data and information quality. First, the technique suﬀers from selection bias since users who discuss an event are already more likely to discuss event-related topics beforehand. Second, there are no viable control groups for comparison to a keyword-based sample of Twitter users. We propose an alternative sampling approach to construct panels of users deﬁned by their geolocation. Geolocated panels are exogenous to the keywords in users’ tweets, resulting in less selection bias than the keyword panel method. Geolocated panels allow us to follow within-person changes over time and enable the creation of comparison groups. We compare diﬀerent panels in two real-world settings: response to mass shootings and TV advertising. We ﬁrst show the strength of the selection biases of keyword-panels. Then, we empirically illustrate how geolocated panels reduce selection biases and allow meaningful comparison groups regarding the impact of the studied events. We are the ﬁrst to provide a clear, empirical example of how a better panel-selection design, based on an exogenous variable such as geography, both reduces selection bias compared to the current state of the art and increases the value of Twitter research for studying events. While we advocate for the use of geolocated panels, we also discuss its weaknesses and application scenario seriously. This paper also calls attention to the importance of selection bias in impacting the objectivity of social media data.
Protest event analysis is an important method for the study of collective action and social movements, which typically draws on traditional media reports as the data source. We introduce Collective Action from Social Media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classiﬁer to identify collective action events occurring ofﬂine. We implement CASM on Chinese social media data and identify 142,427 collective action events from 2010 to 2017 (CASM-China). We extensively evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other Chinese protest datasets. We assess the impact of online censorship, and ﬁnd that it does not substantially limit our identiﬁcation of events. Compared to other datasets of protests, CASM-China identiﬁes relatively more rural, land-related protests, but identiﬁes few collective action events related to ethnic and religious conﬂict.
Mayer N. Zald Distinguished Contribution to Scholarship Student Paper Award from the Section on Collective Behavior and Social Movements of ASA.
How does political protests impact the political behaviors and attitudes of citizens of another regime?
Existing theories imply a positive effect, but conventional data collection methods are unable to answer this question because of their difficulty in identifying witnesses, constructing counterfactuals, and obtaining pre-protest political behaviors. In this paper, I estimate the causal effect of 8 protests in Hong Kong from 2012--14 on political discussions and attitudes toward democray among visitors from mainland China. I treat the 8 protests as natural experiments, which create exogenous variations between treatment groups who were at Hong Kong when protest occurred and control groups who left Hong Kong just before the protests. Using difference-in-differences estimators, I find that physical presence at the scene of the protests caused a dramatic increase on witnesses' frequency of political discussions and their support for democracy, but such increase is limited to those who already discussed politics frequently and showed support for democracy. On the other hand, just being present at Hong Kong when protest occurred leads to a decrease in the frequency of political discussions and support for attitudes for Chinese visitors.
The results are robust in a replication test based on protests in Taiwan and three placebo tests.
This paper has implications for the use of social media data to study the unintended consequences of social movements.
Manuscripts In Preperation
"The Landscape of Collective Action in China"
"What Distinguish Protests From Potential Protests?"