Documentation for degree_09282006.csv For questions email: Matthew Salganik and Tian Zheng Revised version released September 28, 2006 The datafile degree_09282006.csv contains information on the estimated size of the social networks (also know as degree) of a sample of Americans, as well as associated demographic information on sample members. Some of the data were original presented in Killworth et al. (1998a,b) and McCarty et al. (2001). Later, they shared the data with Zheng, Salganik, and Gelman who published a re-analysis (Zheng et al., 2006). Here we are providing some of the results from the 2006 re-analysis along with some of the original data. The second through the fifth column of the datafile present the new information calculated by Zheng et al. The information that you are probably looking for is the fourth column of the datafile, which is the estimated social network size of each sample member. In addition to these 4 new columns, the next 19 columns are the demographic information that was collected in the original dataset. Note that this datafile does not include the original responses to the "how many X's do you know" questions. The first column of the file records the respondent number and ranges from 1 to 1375. However, this datafile includes 1370 cases (ie it excludes respondents 608, 729, 865, 1064, 1303). The reason for the missing cases is that in original McCarty data we identified the following inputs as unreliable (using no variability as a standard). n1a n2a n3a n4a n5a n6a n7a n8a n9a n10a n11a 119 0 0 0 0 0 0 0 0 0 0 0 136 0 0 0 0 0 0 0 0 0 0 0 281 7 7 7 7 7 7 7 7 7 7 7 344 0 0 0 0 0 0 0 0 0 0 0 608 NA NA NA NA NA NA NA NA NA NA NA 729 NA NA NA NA NA NA NA NA NA NA NA 865 NA NA NA NA NA NA NA NA NA NA NA 1064 NA NA NA NA NA NA NA NA NA NA NA 1303 NA NA NA NA NA NA NA NA NA NA NA n12a tr1 tr2 tr3 tr4 tr5 tr6 tr7 tr8 tr9 tr10 tr11 tr12 119 0 0 0 0 0 0 0 0 0 0 0 0 0 136 0 0 0 0 0 0 0 0 0 0 0 0 0 281 7 7 7 7 7 7 7 7 7 7 7 7 7 344 0 0 0 0 0 0 0 0 0 0 0 0 0 608 NA NA NA NA NA NA NA NA NA NA NA NA NA 729 NA NA NA NA NA NA NA NA NA NA NA NA NA 865 NA NA NA NA NA NA NA NA NA NA NA NA NA 1064 NA NA NA NA NA NA NA NA NA NA NA NA NA 1303 NA NA NA NA NA NA NA NA NA NA NA NA NA tr13 tr14 tr15 tr16 tr17 tr18 tr19 tr20 119 0 0 0 0 0 0 0 0 136 0 0 0 0 0 0 0 0 281 7 7 7 7 7 7 7 7 344 0 0 0 0 0 0 0 0 608 NA NA NA NA NA NA NA NA 729 NA NA NA NA NA NA NA NA 865 NA NA NA NA NA NA NA NA 1064 NA NA NA NA NA NA NA NA 1303 NA NA NA NA NA NA NA NA The 5 records with all NA's (608, 729, 865, 1064, 1303) are completely missing records and were excluded from the datafile resulting in a datafile of 1370 respondents. For the 4 records with no variability (all 0's or all 7's) we set the values to NA since these actual responses may not be reliable. These records were regarded as part of the observed data. Since "we followed the usual practice with this sort of unbalanced data of assuming an ignorable model (i.e., constructing the likelihood using the observed data)", these records do not contribute to our estimates. There are two reasons that some of the results from this datafile differ from those presented in Zheng et al. First, some of the estimated degrees presented in the paper (section 4.1 and figure 2) come from fitting the model separately for men and women. The results in this datafile are for fitting the model to the pooled data. Another reason that some of the results from this datafile might be slightly different from the results in the paper is that the results in the paper are based on 1375 cases (including cases 608, 729, 865, 1064, 1303). Including these cases added some noise to mean and median estimates presented in the paper. In this datafile we exclude those non-informative estimates. Also, all responses "don't know" and "not available" were collapsed into one missing data code "NA". For more information about the original data collection and our analysis please read: Killworth, P.D., Johnsen, E.C., McCarty, C., Shelley, G.A., Bernard, H.R. (1998a). A social network approach to estimating seroprevalence in the United States. Social Networks, 20, 23-50. Killworth, P.D., McCarty, C., Bernard, H.R., Shelley, G.A., and Johnsen, E.C. (1998b). Estimation of seroprevalence, rape, and homelessness in the U.S. using a social network approach. Evaluation Review, 22, 289-308. McCarty, C., Killworth, P.D., Bernard, H.R., Johnsen, E.C., and Shelley, G.A. (2001). Comparing two methods for estimating network size. Human Organization, 60, 28-39. Zheng, T., Salganik, M.J., Gelman, A. (2006). How many people do you know in prison?: Using overdispersion in count data to estimate social structure in networks. Journal of the American Statistical Association. Below is the codebook for the dataset. ---------------------------------------------- Variables (24) included rspn: respondent number amean: alpha mean (alpha being log(degree) astd: std err of estimated alpha dmean: exp(amean), degree dstd: exp(astd) Covariates (19): isex, age, mar, educ, rac1, span, pid, emp, occ, occ2, trv1, trv2, nump, livl, rel, org, sta, inc, hea5 ======== Details ================ Q: ISEX 1 Male 2 Female Q: AGE T: 8 10 And what is your age? T: 12 10 (18-110) -9 Not Available Q: mar T: 2 Now I have some questions about you that will help us compare your answers with those of others. Are you currently married, separated, divorced, widowed or have you never been married ? 1 Now married 2 Now widowed 3 Never married 4 Divorced or separated -9 Not available Q: educ T: 2 What is the highest grade of school or year in college you yourself completed ? 0 None.........0 10 High School 10 1 Elementary 01 11 High School 11 2 Elementary 02 12 High School 12 3 Elementary 03 13 College.....13 4 Elementary 04 14 College.....14 5 Elementary 05 15 College.....15 6 Elementary 06 16 College.....16 7 Elementary 07 17 Some Graduate School 17 8 Elementary 08 18 Graduate/Prof. Degree 18 9 High School 09 -8 Dont' know -9 Not available Q: rac1 T: 2 What race do you consider yourself ? T:3 1 White 2 Black 3 Asian or Pacific Islander 4 American Indian 5 Other 6 Multi-racial or mixed race -8 Don't know -9 Not available Q: span T: 2 Are you of Spanish or Hispanic origin ? 1 Yes 2 No -8 Don't know -9 Not available Q: pid T: 2 Are you registered as a Republican, a Democrat, an Independent, some other party, or not registered to vote at all? T: 6 1 Republican 2 Democrat 3 Independent 4 Other party 5 Not registered to vote 6 No party preference -8 Don't know -9 Not available Q: emp T: 2 Are you currently employed outside the home ? 1 Yes 2 No -9 Not available Q: occ T: 2 What kind of place do you work for? INTERVIEWER, THIS QUESTION DESCRIBES THE TYPE OF INDUSTRY THE RESPONDENT IS INVOLVED IN. 1 Agricultural 2 Forestry or Fishing 3 Mining 4 Construction 5 Manufacturing 6 Transportation, Communication, Electric, Gas or Sanitary Services (airline, telephone or electric utility, etc.) 7 Wholesale Trade 8 Retail Trade (department store, hardware store, etc.) 9 Finance, Insurance or Real Estate 10 Services (doctors and lawyers offices, hair salons, veternarians offices, etc.) 11 Public Administration (schools, government agencies, etc.) 12 Nonclassifiable Establishments -9 Not Available Q: occ2 T: 2 And what kind of work do you do? INTERVIEWER, THIS DESCRIBES THE TYPE OF WORK THE RESPONDENT PERFORMS AT THEIR PLACE OF WORK. 1 CLERICAL (agent/cashier/clerk/secretary/typist) 2 CRAFTSMEN/FOREMEN (carpenter/electrician/fireman/police) 3 FARMERS/FARM MANAGERS (owners & tenants - includes fishing) 4 LABORERS (unskilled) 5 MANAGER/OFFICIAL/PROPRIETOR (non-farm only & self employed or not) 6 MISCELLANEOUS (military/retired/housewife/student) 7 OPERATIVES (equipment/machinery/truck) 8 PROFESSIONAL/TECHNICAL (engineer/doctor/lawyer/teacher) 9 SALES (agent/broker/insurance/retail) 10 SERVICE (attendant/barber/cook/janitor/waitress) -9 Not Available Q: trv1 T: 2 As a part of your work, how many trips a year would you say you must make to places outside of the city where you live? <0-365> -8 Don't know -9 Not available Q: trv2 T: 2 Excluding (these) work-related trips, about how many trips a year would you say you make to places outside of the city where you live? Q: nump T: 2 Including yourself, how many people live in your household ? <1-20> -8 Don't know -9 Not available Q: livl T: 2 How long have you lived at your current location (same city) in years? INTERVIEWER, ENTER 0 FOR LESS THAN A YEAR. <0-99> -8 Don't know -9 Not available Q: rel T: 2 Do you belong to any religious organization such as a church or synagogue where you regularly attend, and if so what denomination? T:6 1 Does not belong to religious organization 2 Protestant 3 Catholic 4 Jewish 5 Muslim 6 Other [specify] -8 Don't know -9 Not available Q: org T: 2 Excluding religious affiliations, how many organizations do you belong to where you attend meetings, functions or gatherings at least once a year? <0-99> -8 Don't know -9 Not available Q: sta T: 2 What state do you live in? 1 Alabama 15 Indiana 29 Nevada 43 Tennessee 2 Alaska 16 Iowa 30 New Hampshire 44 Texas 3 Arizona 17 Kansas 31 New Jersey 45 Utah 4 Arkansas 18 Kentucky 32 New Mexico 46 Vermont 5 California 19 Louisana 33 New York 47 Virginia 6 Colorado 20 Maine 34 North Carolina 48 Washington 7 Connecticut 21 Maryland 35 North Dakota 49 W. Virginia 8 Delaware 22 Massachusettes 36 Ohio 50 Wisconsin 9 DC 23 Michigan 37 Oklahoma 51 Wymoning 10 Florida 24 Minnesota 38 Oregon 59 Other 11 Georgia 25 Missippi 39 Pennsylvania -9 Not 12 Hawaii 26 Missouri 40 Rhode Island Available 13 Idaho 27 Montana 41 South Carolina 14 Illinois 28 Nebraska 42 South Dakota Q: INC T: 2 Now consider your family's household income from all sources. As I read a list, please stop me when I get to the income level that best describes your household income in 1999 1 less than $10,000 2 $10,000 to $19,999 3 $20,000 to $29,999 4 $30,000 to $39,999 5 $40,000 to $49,999 6 $50,000 to $59,999 7 $60,000 to $79,999 8 $80,000 to $99,999 9 $100,000 to 150,000 10 Over 150,000 -8 Don't Know -9 Not Available Q: HEA5 T: 2 In general, would you say your health is excellent, very good, good, fair, or poor? 1 Poor 2 Fair 3 Good 4 very good 5 Excellent -8 Don't know -9 Not available