"How many people do you know?: Efficiently estimating personal network size."
Tyler H. McCormick, Matthew J. Salganik, and Tian Zheng.
Journal of the American Statistical Association, in press.
| show abstract |
Abstract:In this paper we develop a method to estimate both individual social network size (i.e., degree) and the distribution of network sizes in a population by asking respondents how many people they know in specific subpopulations (e.g., people named Michael). Building on the scale-up method of Killworth et al. (1998) and other previous attempts to estimate individual network size, we propose a latent non-random mixing model which resolves three known problems with previous approaches. As a byproduct, our method also provides estimates of the rate of social mixing between population groups. We demonstrate the model using a sample of 1,370 adults originally collected by McCarty et al. (2001). Based on insights developed during the statistical modeling, we conclude by offering practical guidelines for the design of future surveys to estimate social network size. Most importantly, we show that if the first names to be asked about are chosen properly, the simple scale-up degree estimates can enjoy the same bias-reduction as that from the our more complex latent non-random mixing model.
"Web-based experiments for the study of collective social dynamics in cultural markets."
Matthew J. Salganik and Duncan J. Watts.
Topics in Cognitive Science, 1:439-468. 2009.
| show abstract | more information | replication data |
Abstract: Social science is often interested in understanding the dynamics of social systems based on the behavior of the individuals that make up the system. However, this process is hindered by the difficulty of experimentally studying how individual behavioral tendencies lead to collective social dynamics in large groups of people interacting over time. In this paper we investigate the role of social influence, a process well studied at the individual level, on the puzzling nature of success for cultural products such as books, movies, and music. Using a "multiple-worlds" experimental design we are able to isolate the causal effect of an individual level mechanism on collective social outcomes. We employ this design in a web-based experiment in which 2,930 participants listened to, rated, and download 48 songs by up-and-coming bands. Surprisingly, despite relatively large differences in the demographics, behavior, and preferences of participants, the experimental results at both the individual and collective level were similar to those found in Salganik, Dodds, and Watts (2006). Further, by comparing results from two distinct pools of participants we are able to gain new insights into the role of individual behavior on collective outcomes. We conclude with a discussion of the strengths and weaknesses of web-based experiments to address questions of collective social dynamics.
"Respondent-driven sampling as Markov chain Monte Carlo."
Sharad Goel and Matthew J. Salganik.
Statistics in Medicine, 28:2202-2229, 2009.
| show abstract | more information |
Abstract: Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present respondent-driven sampling as Markov chain Monte Carlo (MCMC) importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating respondent-driven sampling studies.
"Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market."
Matthew J. Salganik and Duncan J. Watts.
Social Psychology Quarterly, 71:338-355, 2008.
| show abstract | more information | replication data |
Abstract: Individuals influence each others' decisions about cultural products such as songs, books, and movies; but to what extent can the perception of success become a "self-fulfilling prophecy"? We have explored this question experimentally by artificially inverting the true popularity of songs in an online "music market," in which 12,207 participants listened to and downloaded songs by unknown bands. We found that most songs experienced self-fulfilling prophecies, in which perceived---but initially false---popularity became real over time. We also found, however, that the inversion was not self-fulfilling for the market as a whole, in part because the very best songs recovered their popularity in the long run. Moreover, the distortion of market information reduced the correlation between appeal and popularity, and led to fewer overall downloads. These results, although partial and speculative, suggest a new approach to the study of cultural markets, and indicate the potential of web-based experiments to explore the social psychological origin of other macrosociological phenomena.
"Experimental study of inequality and unpredictability in an artificial cultural market."
Matthew J. Salganik, Peter S. Dodds, and Duncan J. Watts.
Science, 311:854-856, 2006.
| show abstract | more information | press | replication data |
Abstract: Hit songs, books, and movies are many times more successful than average, suggesting that "the best" alternatives are qualitatively different from "the rest"; yet experts routinely fail to predict which products will succeed. We investigated this paradox experimentally, by creating an artificial "music market" in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants' choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determined by quality: The best songs rarely did poorly, and the worst rarely did well, but any other result was possible.
"How many people do you know in prison?: Using overdispersion in count data to estimate social structure in networks."
Tian Zheng, Matthew J. Salganik, and Andrew Gelman.
Journal of the American Statistical Association, 101:409-423. 2006.
| show abstract | more information |
Abstract: Networks -- sets of objects connected by relationships -- are important in a number of fields. The study of networks has long been central to sociology, where researchers have attempted to understand the causes and consequences of the structure of relationships in large groups of people. Using insight from previous network research, Killworth et al. and McCarty et al. have developed and evaluated a method for estimating the sizes of hard-to-count populations using network data collected from a simple random sample of Americans. In this article we show how, using a multilevel overdispersed Poisson regression model, these data also can be used to estimate aspects of social structure in the population. Our work goes beyond most previous research on networks by using variation, as well as average responses, as a source of information. We apply our method to the data of McCarty et al. and find that Americans vary greatly in their number of acquaintances. Further, Americans show great variation in propensity to form ties to people in some groups (e.g., males in prison, the homeless, and American Indians), but little variation for other groups (e.g., twins, people named Michael or Nicole). We also explore other features of these data and consider ways in which survey data can be used to estimate network structure.
"Variance estimation, design effects, and sample size calculations for respondent-driven sampling."
Matthew J. Salganik.
Journal of Urban Health, 83:98-111. 2006.
| show abstract | more information |
Abstract: Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling.
"Sampling and estimation in hidden populations using respondent-driven sampling."
Matthew J. Salganik and Douglas D. Heckathorn.
Sociological Methodology, 34:193-239. 2004.
| show abstract | more information |
Abstract: Standard statistical methods often provide no way to make accurate estimates about the characteristics of hidden populations such as injection drug users, the homeless, and artists. In this paper, we further develop a sampling and estimation technique called respondent-driven sampling, which allows researchers to make asymptotically unbiased estimates about these hidden populations. The sample is selected with a snowball-type design that can be done more cheaply, quickly, and easily than other methods currently in use. Further, we can show that under certain specified (and quite general) conditions, our estimates for the percentage of the population with a specific trait are asymptotically unbiased. We further show that these estimates are asymptotically unbiased no matter how the seeds are selected. We conclude with a comparison of respondent-driven samples of jazz musicians in New York and San Francisco, with corresponding institutional samples of jazz musicians from these cities. The results show that some standard methods for studying hidden populations can produce misleading results.
|