DATA ANALYSIS NOTES: LINKS AND GENERAL GUIDELINES
DSS Data Consultant
Finding the question is often more important than finding the answer
I do not understand the output of my regression!!!
Is my model OK?
How to create dummies
Making publication-style tables in STATA
How can I create variables containing percent summaries?
Topics in Statistics
Online Training Section at DSS
BOOK: Stock, James H. and Mark Watson, Introduction to Econometrics, Addison Wesley, 2003.
A very general guideline…
Once you define the question and, hopefully, have a clear idea of what you want to know you can proceed to apply the statistical technique suitable for your data.
At first you need to answer two questions:
1. What is your dependent variable?
2. What is(are) your independent variable(s)?
There is no a straight answer on what kind of technique you need to use for your data. Two factors play a role:
1. Your theory
2. Your data
3. Your knowledge on the topic
For practical purposes the statistical technique you choose will depend mostly on the type of your dependent variable. See the following site for types of analysis using different types of dependent variables http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm.
In general, if your dependent variable is:
1. Dichotomous – (0, 1/ male, female) Use logit or probit. Logit is the most common application.
2. Ordered – (1, 2, 3, 4/ bad, not so bad, not so good, good) going from low to high, negative to positive use ordered logit (or probit)
3. Different categories (1, 2,3/democrat, independent, republican) use multinomial logit.
4. Continuous (1, 1.01, 1.02,…) Regression (simple, multivariate).
Other things to consider:
1. Is your data organized by groups or entites (panel data, cross sectional)
2. What about time (years, months, days, quarters, etc.)
If you have one or both of the previous one you may need to control for variables that vary across time but not entities (like public policies) or variables that vary across entities but not time (like cultural factors).
Once you define your dependent and independent variables you can start exploring the relationships between them. For this you can do the following:
1. Create a correlation matrix for all variables. This will help you to have an idea of the nature of the relationship between not only the dependent and independent variables but also among the later ones (in Stata type spearman [list of variables], star(0.05), or pwcorr [list of variables], sig. Type help spearman or help pwcorr for more details.)
2. Create a scatter plot between the dependent variable and each of the independent variables (in Stata type scatter [dep. var] [indep. var], type help scatter for more options or visit the DSS help or training pages for examples: http://dss.princeton.edu/training/ or the general DSS help pages)