*Getting example data use http://dss.princeton.edu/training/Panel101.dta, clear *To see original data browse *Collapse by country collapse (mean) y (sum) y_bin (max) x1 (min) x2 (median) x3, by(country) *To see collapsed data browse /* country y y_bin x1 x2 x3 A 1.728e+09 8 0.499805 -1.107956 0.9879055 B 2.139e+08 5 0.8801672 1.423646 -0.0129679 C 1.345e+09 7 1.446412 -1.344218 0.4669301 D 3.641e+09 9 0.4863668 1.541221 0.0801717 E 7.231e+08 9 1.39114 1.189055 -0.0894756 F 3.559e+09 9 1.396892 -0.7556524 0.2844574 G 1.706e+09 9 1.257879 -1.621761 2.533169 */ *Categorical variables need to be separated by category, this is *each category becomes a dummy variable (0/1). Then you can either sum or *take the mean (giving the percentage of those in 1). *To separate a categorical variable into different dummies type tab race, gen(race) *See an example here, page 57 * http://dss.princeton.edu/training/StataTutorial.pdf#page=57 *Lets say 'race' has 3 categories, your data will have three variables race1, race2 and race3, *each corresponding to one category. In the collapse you will add collapse (mean) y (sum) y_bin (max) x1 (min) x2 (median) x3 /// (mean) race1 (mean) race2 (mean) race3, by(country) *The '///' means the command continues in the next line *Make sure to save the data with a new name after collapsing *NOTE: *If you want to collapse by more than one variable you can type: collapse .... , by(country state) *For more details type help collapse