Finding the question is often more important than finding the answer

John Tukey

I do not understand the output of my regression!!!

Data Analysis: Annotated Output

Exploring data

Linear regression

Logit regression, ordered logit regression

Factor analysis

Panel data, fixed effects, random effects

Multilevel analysis

Time Series

Descriptive Statistics

### Data Analysis Exampleshttp://www.ats.ucla.edu/stat/dae/

Regression with Stata

Regression

How to interpret dummy variables in a regression

Logit output: what are the odds ratios?

Is my model OK?

Regression diagnostics: A checklist

Logistic regression diagnostics: A checklist

Times series diagnostics: A checklist (pdf)

Times series: dfueller test for unit roots (for R and Stata)

### http://dss.princeton.edu/training/Panel101.pdf

Generating confidence intervals

Confidence intervals in logistic regression

Chow Test

Marginal effects

Outliers, influential and leverage (using SPSS)

Quandt likelihood ratio (QLR test) or sup-Wald statistic

How to create dummies

Making publication-style tables in STATA

How can I create variables containing percent summaries?

Topics in Statistics

### What statistical analysis should I use?

Statnotes: Topics in Multivariate Analysis, by G. David Garson

Elementary Concepts in Statistics

Introductory Statistics: Concepts, Models, and Applications

Statistical Data Analysis

# Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS

Online Training Section at DSS

BOOK: Stock, James H. and Mark Watson, Introduction to Econometrics, Addison Wesley, 2003.

A very general guideline…

Once you define the question and, hopefully, have a clear idea of what you want to know you can proceed to apply the statistical technique suitable for your data.

At first you need to answer two questions:

1.    What is your dependent variable?

2.    What is(are) your independent variable(s)?

There is no a straight answer on what kind of technique you need to use for your data. Two factors play a role:

3.    Your knowledge on the topic

For practical purposes the statistical technique you choose will depend mostly on the type of your dependent variable. See the following site for types of analysis using different types of dependent variables http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm.

In general, if your dependent variable is:

1.    Dichotomous – (0, 1/ male, female) Use logit or probit. Logit is the most common application.

2.    Ordered – (1, 2, 3, 4/ bad, not so bad, not so good, good) going from low to high, negative to positive use ordered logit (or probit)

3.    Different categories (1, 2,3/democrat, independent, republican) use multinomial logit.

4.    Continuous (1, 1.01, 1.02,…) Regression (simple, multivariate).

Other things to consider:

1.    Is your data organized by groups or entites (panel data, cross sectional)

2.    What about time (years, months, days, quarters, etc.)

If you have one or both of the previous one you may need to control for variables that vary across time but not entities (like public policies) or variables that vary across entities but not time (like cultural factors).

Once you define your dependent and independent variables you can start exploring the relationships between them. For this you can do the following:

1.    Create a correlation matrix for all variables. This will help you to have an idea of the nature of the relationship between not only the dependent and independent variables but also among the later ones (in Stata type spearman [list of variables], star(0.05), or pwcorr [list of variables], sig. Type help spearman or help pwcorr for more details.)

2.    Create a scatter plot between the dependent variable and each of the independent variables (in Stata type scatter [dep. var] [indep. var], type help scatter for more options or visit the DSS help or training pages for examples: http://dss.princeton.edu/training/ or the general DSS help pages)