Statistical model

related topics
{math, number, function}
{rate, high, increase}
{theory, work, human}
{math, energy, light}
{acid, form, water}

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y. It is assumed that there is a distinct element of P which generates the observed data. Statistical inference enables us to make statements about which element(s) of this set are likely to be the true one.

Most statistical tests can be described in the form of a statistical model. For example, the Student's t-test for comparing the means of two groups can be formulated as seeing if an estimated parameter in the model is different from 0. Another similarity between tests and models is that there are assumptions involved. Error is assumed to be normally distributed in most models.[1]

Contents

Model comparison

Models can be compared to each other. This can either be done when you have done an exploratory data analysis or a confirmatory data analysis. In an exploratory analysis, you formulate all models you can think of, and see which describes your data best. In a confirmatory analysis you test which of your models you have described before the data was collected fits the data best, or test if your only model fits the data. In linear regression analysis you can compare the amount of variance explained by the independent variables, R2, across the different models. In general, you can compare models that are nested by using a Likelihood-ratio test. Nested models are models that can be obtained by restricting a parameter in a more complex model to be zero.

An example

Length and age are probabilistically distributed over humans. They are stochastically related, when you know that a person is of age 7, this influences the chance of this person being 6 feet tall. You could formalize this relationship in a linear regression model of the following form: lengthi = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of length, ε is the error term, and i is the subject. This means that length starts at some value, there is a minimum length when someone is born, and it is predicted by age to some amount. This prediction is not perfect as error is included in the model. This error contains variance that stems from sex and other variables. When sex is included in the model, the error term will become smaller, as you will have a better idea of the chance that a particular 16-year-old is 6 feet tall when you know this 16-year-old is a girl. The model would become lengthi = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous. This model would presumably have a higher R2. The first model is nested in the second model: the first model is obtained from the second when b2 is restricted to zero.

Full article ▸

related documents
Statistical regularity
Quartile
Unavailability
Sample space
Euler's sum of powers conjecture
Liouville function
Tomaž Pisanski
Fibonacci
Cypherpunk anonymous remailer
Wilhelm Ackermann
Vladimir Voevodsky
Gauss–Markov process
Mrs. Miniver's problem
SISAL
Classical logic
FIPS county code
Face (geometry)
Spaced repetition
Inductive logic programming
Code word
Mathematical constants (sorted by continued fraction representation)
Centralizer and normalizer
Super-Poulet number
Ninety-ninety rule
Randomization
Abraham Robinson
Canonical Encoding Rules
List of basic mathematics topics
XBasic
Cfront