DATA ANALYSIS NOTES: LINKS AND GENERAL GUIDELINES

 

Oscar Torres-Reyna

DSS Data Consultant

 

Finding the question is often more important than finding the answer

John Tukey

 

 

I do not understand the output of my regression!!!

 

Data Analysis: Annotated Output

 

Exploring data

http://dss.princeton.edu/training/StataTutorial.pdf

 

Linear regression

http://dss.princeton.edu/training/Regression101.pdf

 

Logit regression, ordered logit regression

http://dss.princeton.edu/training/Logit.pdf

 

Factor analysis

http://dss.princeton.edu/training/Factor.pdf

 

Panel data, fixed effects, random effects

http://dss.princeton.edu/training/Panel101.pdf

 

Multilevel analysis

http://dss.princeton.edu/training/Multilevel101.pdf

 

Time Series

http://dss.princeton.edu/training/TS101.pdf

 

Descriptive Statistics

http://www.princeton.edu/~otorres/Excel

 

Data Analysis: Annotated Output

http://www.ats.ucla.edu/stat/AnnotatedOutput/default.htm

 

 

Data Analysis Examples
http://www.ats.ucla.edu/stat/dae/

 

 

Regression with Stata

http://www.ats.ucla.edu/STAT/stata/webbooks/reg/default.htm

 

 

Regression

http://www.ats.ucla.edu/stat/stata/topics/regression.htm

 

 

How to interpret dummy variables in a regression

http://www.ats.ucla.edu/stat/Stata/webbooks/reg/chapter3/statareg3.htm

 

 

Logit output: what are the odds ratios?

http://www.ats.ucla.edu/stat/stata/library/odds_ratio_logistic.htm

 

 

Is my model OK?

 

Regression diagnostics: A checklist

http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm

 

 

 

Logistic regression diagnostics: A checklist

http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter3/statalog3.htm

 

 

 

Times series diagnostics: A checklist (pdf)

http://homepages.nyu.edu/~mrg217/timeseries.pdf

 

 

 

Times series: dfueller test for unit roots (for R and Stata)

http://www.econ.uiuc.edu/~econ472/tutorial9.html

 

http://dss.princeton.edu/training/TS101.pdf (Stata)

 

 

Granger Causality

 

http://dss.princeton.edu/training/TS101.pdf

 

http://www.econ.uiuc.edu/~econ472/tutorial8.html

 

http://martinsewell.com/causality/Zorn01.pdf

 

 

Panel data tests: heteroskedasticity and autocorrelation

 

http://dss.princeton.edu/training/Panel101.pdf

 

http://www.stata.com/support/faqs/stat/panel.html

 

http://www.stata.com/support/faqs/stat/xtreg.html

 

http://www.stata.com/support/faqs/stat/xt.html

 

http://dss.princeton.edu/online_help/analysis/panel.htm

 

 

Generating confidence intervals

http://fhss.byu.edu/polsci/Goodliffe/504/stataci.pdf

 

 

 

Confidence intervals in logistic regression

http://www.stata.com/support/faqs/stat/prep.html

 

 

Chow Test

 

http://dss.princeton.edu/training/TS101.pdf#page=23

 

http://www.stata.com/support/faqs/stat/awreg.html

 

 

Marginal effects

 

http://www.stata.com/help.cgi?margins

 

 

http://www.stata.com/support/faqs/stat/mfx_ologit.html

 

http://www.stata.com/support/faqs/stat/mfx_size.html

 

 

Outliers, influential and leverage (using SPSS)

 

http://faculty.chass.ncsu.edu/garson/PA765/regress.htm#outlier2

 

Quandt likelihood ratio (QLR test) or sup-Wald statistic

http://dss.princeton.edu/training/TS101.pdf#page=24

 

 

 

 

How to create dummies

http://www.stata.com/support/faqs/data/dummy.html

 

http://www.ats.ucla.edu/stat/stata/faq/dummy.htm

 

 

          Making publication-style tables in STATA

http://www.fiu.edu/~tardanic/make.pdf

 

 

How can I create variables containing percent summaries?

http://www.stata.com/support/faqs/data/percentvars.html

 

 

 

Topics in Statistics

 

What statistical analysis should I use?

http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm

 

 

 

Statnotes: Topics in Multivariate Analysis, by G. David Garson

http://www2.chass.ncsu.edu/garson/pa765/statnote.htm

 

 

 

Elementary Concepts in Statistics

http://www.statsoft.com/textbook/stathome.html

 

 

 

Introductory Statistics: Concepts, Models, and Applications

http://www.psychstat.missouristate.edu/introbook/sbk00.htm

 

 

 

Statistical Data Analysis

http://math.nicholls.edu/badie/statdataanalysis.html

 

 

 

 

 

Stata Library. Graph Examples (some may not work with STATA 10)

http://www.ats.ucla.edu/STAT/stata/library/GraphExamples/default.htm

 

 

 

 

Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS

http://www.indiana.edu/~statmath/stat/all/ttest/

 

 

 

Online Training Section at DSS

http://dss.princeton.edu/training/

 

 

BOOK: Stock, James H. and Mark Watson, Introduction to Econometrics, Addison Wesley, 2003.

 

 

A very general guideline…

 

Once you define the question and, hopefully, have a clear idea of what you want to know you can proceed to apply the statistical technique suitable for your data.

 

At first you need to answer two questions:

 

1.    What is your dependent variable?

2.    What is(are) your independent variable(s)?

 

There is no a straight answer on what kind of technique you need to use for your data. Two factors play a role:

 

1.    Your theory

2.    Your data

3.    Your knowledge on the topic

 

For practical purposes the statistical technique you choose will depend mostly on the type of your dependent variable. See the following site for types of analysis using different types of dependent variables http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm.

 

In general, if your dependent variable is:

 

1.    Dichotomous – (0, 1/ male, female) Use logit or probit. Logit is the most common application.

2.    Ordered – (1, 2, 3, 4/ bad, not so bad, not so good, good) going from low to high, negative to positive use ordered logit (or probit)

3.    Different categories (1, 2,3/democrat, independent, republican) use multinomial logit.

4.    Continuous (1, 1.01, 1.02,…) Regression (simple, multivariate).

 

Other things to consider:

 

1.    Is your data organized by groups or entites (panel data, cross sectional)

2.    What about time (years, months, days, quarters, etc.)

 

If you have one or both of the previous one you may need to control for variables that vary across time but not entities (like public policies) or variables that vary across entities but not time (like cultural factors).

 

Once you define your dependent and independent variables you can start exploring the relationships between them. For this you can do the following:

 

1.    Create a correlation matrix for all variables. This will help you to have an idea of the nature of the relationship between not only the dependent and independent variables but also among the later ones (in Stata type spearman [list of variables], star(0.05), or pwcorr [list of variables], sig. Type help spearman or help pwcorr for more details.)

2.    Create a scatter plot between the dependent variable and each of the independent variables (in Stata type scatter [dep. var] [indep. var], type help scatter for more options or visit the DSS help or training pages for examples: http://dss.princeton.edu/training/ or the general DSS help pages)